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FOREWORD 



The practice of comparing one individual with 
another is as old as recorded history. Man's 
earliest writings are replete with statements in- 
dicating that he has long viewed his fellow man in 
terms of whether or not he measured up to an 
expected ideal. Similarly, the performance of a 
man has traditionally been described in terms of 
how It compares with that of another man. 
However, subjecting these "known" differences to 
the scientific method of inquiry is a recent 
development. 

In the area of individual differences iv 
behavior and psychological characteristics, re- 
search has progressed from the simple to the 
complex. The first studies dealt with the simple 
functions of speed of reaction time. Today, studies 
are aimed at measuring individual differences iii 
the complex functions of motivation, ego- integra- 
tion, and cognition. 

Progress in developing a technology for 
measuring behavior has progressed in a similar 
manner. Instruments are available which, most 
scientists will agree, accurately measure the 
speed with which an individual taps his finger in 
response to a given signal. Scientists do not 
agree, however, on the adequacy of the equipment 
used to measure individual differences in intelli- 
gence. Moreover, there will even be some dis- 
agreement over the use of the word "intelligence" 
to describe certain aspects of behavior. 

Because of the present state of the art of 
psychological measurement, studies such as those 



conducted by the Health Examination Survey 
encounter difficult problem« in attempting to esti- 
mate the prevalence of v^arioua mental health 
factors in the population. 

The Health Examination Survey is part of the 
U.S. National Health Survey, authorized by 
Congress in 1956 to collect information about the 
Nation's health. Data ai'e collected by direct 
examinations of individual persons chosen to 
constitute a probability sample of some segment of 
the total population of the United States. 

The first sample represented the adultpopu- 
lation aged 18 through 79 years. Since the study 
was primarily concerned with the prevalence of 
chronic physical diseane, the examination did not 
include psychological measurements. The second 
sample consisted of noninstitutionalized children 
ages 6 through 1 1 , among whom the incidence of 
chronic disease is insignificant. The important 
health factors in this group are found in those 
functions which result in growth and development. 
These, then, were the factors to be studied. 

Many authorities in the field of growth and 
development contributed to the planning phase of 
the Survey. Although they generally agreed on what 
factors should be measured, they could not agree 
on how the measurements should be obtained. They 
did conclude that present instruments were inade- 
quate but that these were the only tools available. 

The tests which are discussed in the following 
report were those selected for use by the Health 
Examination Survey. In choosing these instru- 
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ments, primary consideration was given to those 
which best met the following criteria: 

1. They were capable of yielding data in 
those areas considered most important 
to the study of growth and development. 

2. They would produce data in a form which 
would be meaningful to the individuals 
responsible for children's health. 

3. They were suitable for use in a survey 
operation where examiners change fre- 
quently, where only 1 hour is available 
to conduct the examination, and where 
examining conditions are less than opti- 
mal. 



The selected instruments are not ideal, but 
they are felt to be the best compromise offered 
by the present state of the art of measurement. 

How much was compromised? What can be 
said about the growth and development of chil- 
dren from the data obtained by the use of these 
instruments? 

Through a contractual arrangement with Dr. 
Sells, the first step has been taken in answering 
these questions. 



Lois R, Chatham, Ph.D. 
Psychological Advisor 
Division of Health Exam- 
ination Statisti.^s 
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IN THIS REPORT the psychological procedures used in the Health Ex- 
amination Survey cond *cted bettveen June 1963 aftd December 1965 for 
children ages 6 through 11 are critically evaluated. 

In his analysis, the author combines his own professional competence 
with the information obtained in an extensive survey of literature per- 
taining to the four procedures used — the Wechsler Intelligence Scale for 
Children, the WideRange Achievement Test, a modification of the Draw- 
A-Man Test, and the Thematic Apperception Test. The result is an 
evaluation of the instruments which is made in terms of their validity, 
reliability, and applicability for use in the Health Examination Survey. 

Finally, the author points out the strengths and weaknesses of each pro- 
cedure and makes recommendatiovs concerning' the eventual use of data 
obtained in the St rvey. 



SYMBOLS 

Data not available 

Category not applicable 

Quantity_zer.o- ----- -r. ----- r" - " 

Quantity more than 0 but less than 0.05 

Figure does not meet standards of 
reliability or precision 



EVALUATION OF 

PSYCHOLOGICAL MEASURES 
USED IN THE HEALTH EXAMINATION SURVEY 

OF CHILDREN AGES 6-11 

S. B. Sells, Fh.D\, Institute of Behavioral He search, Texas Christian University 

INTRODUCTION 



This report is the outcome of a contract with 
the National Center for Health Statistics. The 
purpose of the contract was to obtain an objective 
critical evaluation of the psychological procedures 
chosen for use. in the Health Examination Survey 
of children ages 6 through ll. The objectives may 
be summarized as follows: 

1. To prepare a critical review concerning 
the development and use of the psycholog- 
ical procedures used in Cycle II based on 
available literature and unpublished re- 
ports (theses,. dissertations, and others). 
These measures include the Vocabulary 
and Block Design subtesteof the Wechsler 
Intelligence Scale for Children, the Oral 
Reading and Arithmetic subtests of the 
Wide Range Achievement Test (1963 edi- 
tion), the Draw-A-Man Test, and cards 
1, 2, 5, 8BM, and 16 of the Thematic Ap- 
perception Test. 

2* To make recommendations concerning the 
appropriate inferences which can be made 
concerning individual growth and develop- 
ment based on scores derived irom the 
test battery described above. 

3. To recommend what research must be 
done if the objectives of the Health Ex- 

amination-Survey-are-tO'beaccomplished- 

4. To make original recommendations con- 
cerning the types of cross-disciplinary 



analyses that can be performed on data 
obtained in the Health Examination Survey 
of children. 

An extensive survey of the literature was 
made, but only tba most relevant material was 
included in this final report. Literature was con- 
sidered relevant if it was either empirical re- 
search or a review which included or made ref- 

''erence to the tests used in the Survey. Empirical 
studies which were conducted on samples of U^S. 
children ages 6 to 12 years we ri; given preference. 
A few important reports which did not meet these 
criteria were included because of their niethod- 
ological features or their significant content. Un- 
published master's theses and dissertations were 
obtained, as extensively as possible, by inter- 
library loan. Information was sought and, with 
some success, obtained from the publishers and 
selected users of the reviewed tests. 

One empirical study was carried out under 
this contract. Its results are included in the sec- 
tion on the Goodenough Draw- A^ Man Test. The 
study was stimulated by a recent publication by 
Dale B. Harris entitled Children^ Si Drawings as 
Measures of Intellectual Maturity. This text is 
basically a revision of the 1926 book by Florence 
L. Goodenough entitled Measurement of Jnteili' 

'5^ewce-6y-/>yaM/i«g^5.-'In-his-publlcation^HarriS'ln-- 
eludes new point-score scales and modernized 
norms for scoring drawings of the human figure. 
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The text of this report Is divided into six 
sections. Sections I-IV present critical discus- 
sions of various tests used by the Health Examina- 
tion Survey. The tests are discussed in the follow- 
ing order: 

L The Wechsler Intelligence Scale for 
Children, Vocabulary and Block Design 
subtests 

II. The Wide Range Achievement Test, the 
Oral Reading and Arithmetic subtests 

III. The Goodenough Draw-A-Man Test 

IV, The Thematic Apperception Test 

Section V briefly discusses some of the issues 
which arise when these tests are used as a bat- 
tery. Finally, section VI considers the cross- 
disciplinary relationships between "psychologi- 
cal" and "n.inpsychological" measures. 

Each research study or review referred to 
in this report is identified by a number placed in 
parentheses immediately following the cited ref- 
erence. Bibliographies following each of the first 



four sections of the report contain all references 
cited in the respective sections. 

Research studies which were abstracted as 
part of the literature-revl^>w portion of this con- 
tract are also included in the four bibliographies. 
The actual abstracts of the reviewed literature 
appear as appendixes to the report. For conven- 
ience, numbers which identify the abstracts cor- 
respond to the number given when the reference 
is cited in the text of the report. 

These abstracts have been deposited as docu- 
ment number 8486 with the Library of Congress. 
A copy may be secured by sending the document 
number and $28.80 for photoprints or $3.20 for 
35mm. microfilm to the American Documenta- 
tion Institute Auxiliary Publication Project, Pho- 
toduplication Service, Library of Congress, Wash- 
ington, D.C., 20541. Advance payment is required. 
Checks or money orders should be made payable 
to Chief, PhotodupUcation Service, Library of 
Congress. 



I. THE WECHSLER INTELLIGENCE SCALE FOR CHILDREN, 
THE VOCABULARY AND BLOCK DESIGN SUBTESTS 



This section reviews the measurement char- 
acteristics of the Vocabulary (Voc.) and Block 
Design (BD) subtests of the Wechsler Intelli- 
gence Scale for Children (WISC), both as a sepa- 
rate unit and as a WISC short form. It also reviews 
behavioral correlates of intelligence as report-^d 
in the literature and critically evaluates the appro- 
priateness of their use in Cycle II of the Health 
Examination Survey. 

The selection of the Vocabulary and Block 
Design subtests for use as part of the psycho- 
logical test battery for Cycle II, in effect, treats 
these subtests as a short form of the WISC. In 
addition to providing an estimate of the WISC 
score, the two subtests may be interpreted sepa- 
rately, in combination with other test scores, or 
in conjunction with other Survey data. Combina- 
tions of thjse measures with other data obtained 
in the Survey are discussed in section 11. 



DESCRIPTION OF THE WISC 

The WISC, which was published in 1949, 
extended the well-known Wechsler intelligence 
scales for adolescents and adults into the child- 
hood range of 5 to 15 years. During the decade 
and a half since its publication the * ISC has 
been the subject of extensive investigation and 
has achieved wide school and clinic use where 
individual measures of intelligence are desired. 

The WISC is patterned ifter the V^echsler- 
Bellevue Intelligence Scale both in the structure 
of the subtests and the scales and in the use ol 
the deviation intelligence quotient. The test con- 
sists of 12 subtests— 6 Verbal and 6 Perform- 
ance—of which 2 (Digit Span of the Verbal Scale 
and Mazes of the Performance Scale) are supple- 
mentary and not routinely used. The 5 subtests 
comprising the Verbal Scale are as follov^s: 
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Ir<formation. Comprehension, Arithmecic, Simi- 
larities, and Vocabulary. The 5 Performance Scale 
subtests are Picture Completion, Picture Ar- 
rangement, Block De'^ign, Object AsBembly, and 
Coding (Digit Symbols). 

An important innovation in the Wechsler in- 
telligence tests is the use of the deviation IQ. 
This device supplants the mental age concept and 
evaluates the performance of each individual on 
the basis of the distribution of scores of a repre- 
sentative sample of his own chronological age. In 
the standardization of the WISC, Wechsler kept 
the standard deviation of intelligence quotients 
constant frc n year to year, with the result that 
"a child's obtained IQ does not vary unless his 
actual test performance as compared with his 
peers varief " 

Raw scores for each subtest are converted 
to scaled scores which have a mean of 10 and 
standard deviation of 3 for each age level. The 
sum of five scaled scores for the Verbal Series 
constitutes the Verbal Scale score (VS), and simi- 
larly the Performance Scale Pcore(PS)isthesum 
of the five Performance Series scaled scores. The 
Full Scale score (FS) is the sum of the Verbal 
Scale and the Performance Scale. Deviation in- 
telligence quotients have been derived by a sim- 
ilar conversion process for VS, PS, and FS. The 
IQ scales at each age have a mean of 100 and 
standard deviation of 15. 

The standardization of the WISC is reported 
in Wechsler 's manual (101), and the standardiza- 
tion sample is summarized in terms of age, sex, 
geographic representation, urban-rural compo- 
sition, and composition by socioeconomic status 
(reflected by occupation of fathers). 71ie WISC 
was standardized on a total sample . >i 2, 200cases, 
including 100 white boys and IOC white girls at 
each age from 5 to 15 years. The proportion of 
urban children, in the sample was slightly higher 
than in comparable United States population sta- 
tistics. 

Reviewers have commented very tavorably 
on the WISC as a test of superior quality (102- 
104), but, as in all areas of mental measurement, 
imperfections have been noted and users have 
attempted to employ it for purposes for which it 
was not specifically designed. In general, the 
deviation IQ has been accepted as an improvement 
over the IQ cwnputed by dividing mental age by 
chronological age. Except for £ slight bias for 



urban and smalltown areas— as opposed to rural 
areas— for a native white population, the sampling 
basis of the WISC has been regarded as good. 

Maxwell (106), and also Wilson (139), has 
criticizeu the linearity of the transformation of 
raw scores to scaled scores, which may be a 
problem when sampling extreme cases and widely 
varying regional, ethnic, and linguistic groups. 
Hite (112) reported that the WISC lacks items of 
middle-range difficulty at all age levels and is too 
difficult for young children, particularly those in 
the age range 5 to 6 years. In the studies reviewed, 
WISC Full Scale IQ's have indeed tended to be 
lower than comparable Stanford- Binet IQ's. This 
is especially true at the lower age levels. McCand- 
less (103) noted that girls tend to test lower than 
boys on the WISC, but support for this generali- 
zation is equivocal in the pres3nt review. 

In evaluating the utility of the Vocabulary and 
Block Design short form of the WISC for the Survey 
it is appropriate toconsider shortcomings of these 
tests in relation to alternatives that might have 
been considered— given the constraints of testing 
time available in the Survey schedule and the 
general problems of a national survey. It may be 
noted that although the WISC norms are inappro- 
priate in varying degrees for Negro, bilingual 
and foreign-bom, illiterate, retarded, defective, 
rural, and oJier special groups for which the test 
was not designed, there is no adequate measure 
that can be applied to all. On the other hand, 
because of the extensive research on the WISC, 
reported below, it may be possible to estimate 
errors in the Vocabulary and Block Design sub- 
ijsts and in the scores derived from them for 
various components of the Survey sample. In ad- 
dition, relationships of these variables to the 
Goodenough Draw-A-Man Test offer further op- 
portunities for ccmp)ensatory analysis. 

R^SEARC^^ ON SHORT FORMS 
OF THE Wise 

Several investigators have combined two or 
more subtests in order to develop an efficient 
short form of the WISC that correlates well with 
the Full Scale and produces comparable means 
and standard deviations (175-179, 231, and 235). 
Of these, only one article, by Simpson and Bridges 
(177), reported favorable results with the combi- 
nation of Vocabulary and Block Design. They used 
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a sample of 120 children over the age range of 
65 to 192 months. 

Finley and Thompson (231) developed for a 
sample of 309 mentally retarded persons a short 
form with five subtests, including Block Design, 
which correlated 0,89 with FS IQ. Significantly, 
their report included correlations of 0.55 and 
0.45, respectively, for Voc. and BD with FS IQ, 
while the correlation of Voc. and BD was only 0.1. 
Further, estimation of mean FS IQ by proration of 
the sum of Voc. and BD, as reported by these 
authors, approximated the actual FS IQ quite 
closely. 

Schwartz and Levitt (235) also reported a 
short form of the WlSC for educable retarded chil- 
dren, consisting of six subtestsincluding Voc. and 
BD which correlated 0.95 with FS IQ. However, 
their best combination of five subtests, which re- 
duced the correlation to 0.92, eliminated Block 
Design. Osborne and Allen (239), on the other 
hand, cross-validated two triads of V/'ISC subtests 
including Voc. and BD, one with Picture Com- 
pletion and one with Picture Arrangement, using 
samples of 240 (initial) and 50 (validation) retarded 
children aged 7 to 14 years, with correlations with 
FS IQ of 0.88 to 0.90. 

At the same time, Hite (112) has confirmed 
Wechsler's data (101) indicating that Vocabulary 
and Block Design are the moot reliable subtests 
in the WlSC battery. Hagen (109) and Cohen (111) 
in the United States and Gault (110) in Australia 
have reported that both of these subtests are 
highly loaded on the general factor obtained in 
factor analysis of the WlSC over the entire age 
range of 5 to 15 years. Cohen found that Vocabu- 
lary was the strongest single measure of the 
general factor. Nevertheless, a problem exists in 
determining the optimal combinationof these sub- 
tests to estimate the FS IQ and various parameters 
related to the Survey objectives. 

Simpson and Bridges (177) estimated the FS 
IQ on the basis of a simple sum of the scaled 
scores of Voc. and BD and reported a conversion 
table for this purpose. Inasmuch as their results 
have not been replicated, so far as is known, 
cross-validation on a siibstantial sample should 
be considered before this table is adopted. The 
importance of this recommendation is illustrated 
by some computations based on the Finley and 
Thompson data (231). The sum of mean Voc. and 



BD scaled scores, 11, multiplied by 5 to prorate 
the FS score, gives a WlSC Full Scale IQ of 70 
(as compared with the actual mean of 68), while 
the score of 11 in the Simpson and Bridges tables 
yields an FS IQ of 77. Further, in view of Max* 
weirs criticism of the transformation of raw 
scores to scaled scores (106), it may be advisa- 
ble also to explore empirically the alternative 
of predicting the FS IQ from raw scores. 

In reviewing the WlSC literature every effort 
v/as made to focus on the Voc. and BD subtests, 
and considerable data have been assembled. 
Nevertheless, the major portion of the information 
referred to in this report is based on the full test, 
and assumptions of equivalence of short form 
scores to the Full Scale must be made in gener- 
alizing the results reported. As indicated above, 
this assumption is not entirely inappropriate, but 
caution is certainly indicated. 

RELIABILITY AND STABILITY 

Wechsler's manual (101, p. 13) reported cor- 
rected split-half reliability coefficients of 0,77, 
0.91, and 0.90, respectively, for Vocabulary, and 
0.84, 0.87, and 0.88, respectively, for Block De- 
sign for samples of 200 children at each of the 
following age levels: 7 1/2, 10 1/2, and 13 1/2 
years. The corresponding FS reliabilities were 
0.92, 0.95, and 0.94, respectively. As noted above, 
these two subtests were the most reliable of all the 
WlSC subtests. Ihese results for Voc. and BD have 
been confirmed by Hite (112) for children in the 
age range of 5 to 7 years. 

Stability of the WlSC on retest has also been 
found satisfactory by Gehman and Matyas (113) 
over a 4-year period (age 11 years at initial test), 
by Reger (115), who tested a sample at ages 10, 
11, and 12 years, and by Whatley and Plant (116), 
who used a 17-month interval. In these studies, 
x'etest correlations were generally of the order of 
the corrected split-half reliabilities. These and 
related data are summarized in table 1., 

VALIDITY 

Despite the fact that Wechsler developed the 
WlSC in protest against the measurement concept 
of mental age (and the IQ based on it) implicit in 
the Stanford-Binet test, and despite the additional 
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Table 1. Studies reporting reliability coefficients of the WISC 



Investigator 



Year 


Subjects" 


Age range 


Number 


Coefficient 




M 


F 


Voc. 


BD 


VS 


PS 


FS 


1962 


Retarded 


11-0 - 14-11 


39 


39 




0 . 79 


0 .82 


0.92 


0 . 89 


0 .95 


1955 


Guidance clinic- — 


5-0 - 14-11 


200 


100 


100 


0 . 94 


N.R. 


N. R. 


N.R. 


N. R. 






5-7 years 


20 


20 




0 .92 


N.R. 


N.R. 


N. R. 


N. R. 






5-7 years 


20 




20 


0 .90 


N, R. 


N. R. 


N. R. 


N.R. 






7-9 years 


20 


20 




0.93 


N.R. 


N.R. 


N.B . 


N.R. 






7-9 years 


20 




20 


0.91 


N. !l. 


N.R. 


N.R. 


N.R. 






9-11 years 


20 


20 




0 .87 


N.R. 


N.R. 


N.R. 


N.R. 






9-11 years 


20 




20 


0.89 


N.R. 


N.R. 


N.R. 


N.R. 






11-13 years 


20 


20 




0.88 


N.R. 


N.R. 


N.R. 


N. R. 






11-13 years 


20 




20 


0.88 


N.R. 


N.R. 


N.R. 


N.R. 






13-15 years 


20 


20 


- 


0.90 


N.R. 


N.R. 


N.R. 


N.R. 






13-15 years 


20 




20 


0.96 


N.R. 


N.R. 


N.R. 


N.R. 


1956 




11-1 


60 


29 


31 


N.R. 


N.R. 


0.77 


0.74 


0.77 




1954 


Normals (Negro) 


9-7 - 10-6 


60 






0.70 


0.89 


0.&2 


0.90 


0.84 


1962 






240 


120 


120 












Normals (England) — 


















7-6 - 8-5 


80 


40 


40 


0.70 


0.74 


0.86 


0.80 


0.89 






8-6 - 9- 


80 


40 


40 


0.70 


0.68 


0.87 


0.81 


0.90 






9-6 - 10-5 


80 


40 


40 


0.70 


0.75 


0.90 


0.85 


0.94 


1949 


Nomals (WISC 
standardization 
data) . 




600 


300 


300 




























7-6 


200 


100 


100 


0.77 


0.84 


0.88 


0.86 


0.92 






10-6 


200 


100 


100 


0.91 


0.87 


0.96 


0.89 


0.95 






13-6 


200 


100 


100 


0.90 


0.88 


0.96 


0.90 


0.94 


1953 






200 


117 


83 






























5-6 


50 


34 


16 


0.71 


0.77 


0.77 


0.81 


0.90 






6-6 


100 


56 


44 


0.72 


0.84 


0.89 


0.89 


0.91 






7-6 


50 


27 


23 


0.76 


0.89 


0.89 


0.86 


0.9Ci 


1952 


Normals (WISC 
standardization 
data) . 




400 


200 


200 




























5 years 


200 


100 


100 


0.68 


0.77 


N.R. 


N.R, 


N.R. 






15 years 


200 


100 


100 


0.91 


0.89 


N.R. 


N.R. 


N.R. 



Throne, Schulman, and 
Kaspar (227). 



ivrmstrong (175)- 



Gehman and Matyas 
(113). 

Caldwell (252) 

Jones (154) 

Wechsler (101) 



Hlte (112)- 



Hagen (109) 



"Designations of subjects are always white Americans unless otherwise specified. 

Time between testings vas 49 months. 
^Data are from the WISC standardization sanple, but were not reported in the WISC manual. 

NOTES: All correlation coefficients are Pearson Product-Moment unless otherwise specified. 

Z — Total population; M— male; f — female; Voc. — Vocabulary; BD— Block Design; VS— Verbal Scale; PS— Performance 
Scale; FS— Full Scale; N.R,— not reported. 
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Tabic 2. Studies reporting correlation between Che WISC and Stanford-Binet 



InveutlgaCor 



Nale (216) 

SCaccy and Levin (228) 

Sloan and Schneider (217) 

Orr (188) 

Sharp (229) 

Post (198)— 

Kent and Davis (20 7) 

Muhr (119) 

Davidson (162) — 

Kardos (161) 

Matyas'' (U4) 

Raleigh (191) 

Schwi tzgoebe 1 ( 189) 

Clarke (160) 

Frandsen and Higginson (159) 

Reidy (171) 

Jones (154) 



Arnold and Wagner (158) 

Wagner (156) 

Scott (135) ■ 

Beeman (153)— 

Harlow, Price, Tatham, and 
Davidson (145), 

Cohen and Collier (124) 

Tatham (152) < 



Mussen, Dean, and Rosenberg 1952 NL-rraals 

(ll/>. 



Year 



1951 
1951 
1951 
1950 
1957 
1952 
1957 



1952 



1954 
1954 
1954 

1952 
1952 
1950 
1951 
1952 
1962 



Subjects 



1957 



1952 
1952 



Mental defectives------ 

Mental defectives 

Mental defectives 

Retarded — ------------ 

Slow learners 

Stutterers 



Normals and clinic rcferralt 
(England) • 

Normals 

De linquents-------- 

Psychiatric outpatients — 



Institutional (orphans and 
various problems) 



Normals------- ---- 

Nontidls 

Normals 

Grade 5 

Grade 9 (retest)- 

Normals 

Normals 

Normals--- ------- 

Normals 

Normals- — -------- 

Normals (England) -- 



1955 Normals 
1951 
1950 
1960 



Normals------- -- ------------ 

Normals 

Normals 



Normals- 



Normal s - 
■^iorraals- 



Age range 



8-10 - ;5-ii 

7- 2 " 15-11 

N.R. 
N.R. 

8- 0 - 16-5 
5-5 - 15-10 

8-12 years 



5-0 - 6-11 

5 years 

6 years 

14-0 - 14-3 
li-11 - 13-0 



11-1 (mean) 
15-2 (mean) 

10-8 " 14-9 

9-11 - 13-8 

9-7 - 12-9 

9-1 - 10-3 

9-0 - 11-11 

8-10 years 

8 ye^rs 
S years 

8 years 

9 years 
9 years 

2 9 years 
10 years 
10 yeais 
£ 10 years 

8-9 years 
8-9 years 
7-7 - 11-1 
7-2 - 11-9 



6-6 - 6-7 

10-0 - 10-1 

6-5 - 8-9 

6-5 - 6-7 

6-0 - 13-1 



Number 


Correlation 


I 


A 


F 


Voc. 




VS 


PS 


FS 


104 


54 


50 


N.R. 


N.R. 


N.R. 


N.R. 


0.91 


70 






N.R. 


N.R. 


N.R. 


N.R. 


0.68 


40 


20 


20 


N.R. 


N.R. 


0.75 


0.64 


0.76 


10 






N.R. 


N.R. 


^0.81 


0.49 


^0.71 








N.R. 


N.R. 


0.62 


0 . 67 


0 .69 


30 


27 


3 


N.R. 


N.R. 


0.80 


0.37 


0.78 


213 
118 
55 


133 
59 
48 

OIL 


80 
59 
7 


N.R. 


N.R. 


N.R. 


0.58 


N.R. 












14 






















42 
21 
21 


— 


— 


N.R. 
N.R. 
N.R. 


N.R. 
N.R. 
N.R. 


0.46 
0.65 
0.44 


0.52 
0.66 
0.39 


0.62 
0.74 
0.49 


30 




— 


N.R. 


N.R. 


0.79 


0.71 


0.83 


100 


50 


50 


N.R. 


N.R. 


0.87 


0.82 


0.89 


60 
60 
60 


29 
29 
29 


31 
31 












N.R. 
N . R. 


N.R. 
N.R. 


0.78 
0 . 76 


0.46 
0 . 64 


0.73 
0.77 


100 


52 


48 


N.R. 


N.R. 


0.77 


0.59 


0.80 


100 


52 


48 


N.R. 


N.R. 


0.78 


0.61 


0.84 


84 


39 


45 


N.R. 


N.R. 


0.83 


0.57 


0.79 


54 






N.R. 


N.R. 


0.71 


0,63 


0.80 


60 


30 


30 


N.R. 


N.R. 


0.87 


0.69 


0.86 


240 
40 
40 
80 
40 

80 
40 
40 
80 


120 
40 

40 
40 

40 
40 

40 


120 

40 
40 

An 
40 

- 

40 
40 


N.R. 
N.R. 
N.R. 
N.R. 
N.R. 
N.R. 
N.R. 
N.R. 
N.R. 
N. R 


N.R. 
N.R. 
N.R. 
N.R. 
N.R. 
N.R. 
N.R. 
N.R. 
N. R. 
N.R. 


0.84 
0.77 
0.79 
0.78 
0.89 
0 . 78 
0.84 
0.86 
0 .90 
0 . 88 


0.59 
0.48 
0.46 
0.47 
0.65 
0 . 58 
0.61 
0.64 
0 . 67 
0 . 66 


0.81 
0.72 
0.76 
0.74 
0.90 
0 .75 
0.84 
0.83 
0 .86 
0 .85 


5C 






N.R. 


N.R. 


0.85 


0.75 


0.88 


An 






N.R. 


N . R. 


0.77 


0 .87 


0 .61 


In 

JU 






0.63 


0 . 60 


0 .86 


0 . 86 


0 .92 


36 







N.R. 


N.R. 


0.64 


0.42 


0.67 


60 
30 
30 




















N.R. 
N.R. 


N.R. 
N.R. 


0.64 
0.88 


0.61 
0.52 


0.64 
0.83 


51 






N.R. 


N.R. 


0.82 


0.80 


0.85 


30 






N.R. 


N.R. 


0.64 


0.51 


0.C4 


39 






N.R. 


N.R. 


0.63 


0.72 


0.65 



See footnotes at end of table. 
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Table 2. Studies reporting correlation between the WISC and Stanford-Blnet"— Con. 



Itivestlgator 



Kruginan, Justman, Wright- 
stone, and Xrugman (144) 



Pastovlc** <121)- 



Wlnpenny (105)- 



Dunsdon and Roberts (170)- 



loruszak (146)- 



olland (109) 

elder, Noller, and Schraumm 
(150) 



lureth, Muhr, and Welsgerber 
(118) 



lottersman (151) 

frlggs and Cartee (148)- 
Drr (188) - 



Stanley (157) 

Schachter and Apgar (147) 



Estes» Curtln, DeBurger, and 
Denny (125) - 



Year 



195L 



1951 
1951 

1955 

1954 

1953 
1951 

1952 

1950 
1953 
1950 

1955 
1958 



1961 



Subjects 



Normals - 



Normals - 



Normals -------- 

Kindergarten 

Grade 2 

Grade 5 

Normals (England) - 



No^raals- 



NormalM- 
Normals- 



Normals- 



Normals 

Normals (S-3, Form M) - 



Normals----------------- - 

Grade 1 

Grade 4 

Grade 7 

Normals (from Frand^en and 
Hlgglnson, 159, above) 

Normals , mixed sample------ 

Whlte 

Negro 

Puerto Rlcan 

Oriental 



Normals, Grades 1-8- 

Form L 

Form L-M 



Age range 



6 years 

7 years 

8 years 

9 years 
10 years 
LI years 



5-6 
7-6 



5-4 - 5-8 
7-4 - 7-8 
9-7 - 12-9 

5-0 - 14-11 



5-14 years 
5-14 years 
5-14 years 

5-13 years 

5-0 - 11-11 
5-0 - 7-11 
8-0 - 11-11 

5-6 years 

5 years 

6 years 

6 years 
5 years 



N.R. 
N.R. 
N.R. 

N.R. 
N.R. 



W.R. 
N.R. 



Number 



222 
38 
43 
44 
31 
29 
37 

100 
50 
50 

185 
50 
50 
85 

1.947 
980 
967 

80 
40 
40 

52 

106 
44 
62 

100 
50 
50 

50 
46 
40 
1. 
14 
11 

50 

113 
39 
66 
6 



82 
82 
82 



980 
980 



21 



61 



47 
47 
47 



967 

967 
40 

40 



29 



52 



Correlation 



N.R. 
N.R. 
N.R. 
N.R. 
N.R. 
N.R. 



N.R. 
N.R. 



N.R. 
N.R» 
N.R. 



N.R. 
N.R. 

N.R. 
N.R. 
N.R. 

N.R. 

N.R. 
N.R. 
N.R. 

0.51 
0.42 
0.65 

N.R. 

N.R. 



N.R. 
N.R. 
N.R. 

N.R. 
N.R. 



N.R. 
N.R. 



BD VS PS FS 



N.R. 
N.R. 
N.R. 
N.R. 
N.R. 
N.R. 



N.R. 
N.R. 



N.R. 
N.R. 
N.R. 



N.R. 
N.R. 

N.R. 
N.R» 
N.R. 

N.R. 

N.R. 
N.R. 
N.R. 

0.61 
0.65 
0.55 

N.R. 

N.R. 



N.R. 
N.R. 
N.R. 

M.R. 
N.R. 



N.R. 
N.R. 



0.73 
0.64 
0.78 
0.83 
0.G8 
0.69 



0.63 
0.82 



N.R. 
N.R. 
N.R. 



N.R. 
N.R. 

0.87 
0.89 
0.86 

0.88 

0.89 
0.82 
0.92 

0.75 
0.79 
0.71 

0.71 

0.58 



0.63 
0.64 
0.88 

N.R. 
0.64 



N.R. 
N.R. 



0.74 
0.49 
0.57 
0.79 
0.54 
0.53 



0.57 
0.71 



N.R. 
N.R. 
N.R. 



N.R. 
N.R. 

0.78 
0.72 
0.71 

0.73 

0.77 
0.79 
0.78 

0.71 
0.73 
0.71 

0.49 

0.48 



0.62 
0.65 
0.66 

N.R. 
0.48 



N.R. 
N.R. 



^Unless otherwise noted, Stanford-Blnet , Form L, 

Designation of subjects are always white Americans unless otherwise specified. 
*^Rank difference correlation. ''Also reported by Gehman and Matyas in 1956. 

**Al8o reported by Pastovlc and Guthrie In 1951. 'Intraclass correlation. 

"Average time between S-B and WISC r.dnlnlstratlon was 50.8 months. 

NOTES: All correlation coefficients are Pearson Product-Moment unless otherwise specified. 

2 — Total population; M — male; F — female; Voc. — Vocabulary; BD — Block Des:.gn; VS — Verbal Scale; FS — Performance 
Scale; FS — Full Scale; N.R. —not reported. 
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Tabic 3. Studies reporting correlation between the WISC and other measures 



Investigator 



Smith (126) 

McBrcarty (123)- 



Cohen «.nd Collier 
(124). 



Win penny (105) 



Armstrong and Hauck 
(130) . 



Wlnpenny (105) 



Cooper (242)- 
Altus (122) — 
Altus (134)- 



Cooper (242)- 



Schult£[(oebel 
(189). 

Barratt (138)- 



Warren and Collier 
(224) . 

Thompson (19 3)---- 



Warren and Collier 
(224). 

Armstrong and 
Hauck (130) . 

Rottersman (151)-- 
Klmbrell (136) 



Year 



Smith (126)- 



Delp (135)- — 
Cooper (242)- 



Sharp (229)- 



1961 
1951 

1952 

1951 

1960 

1951 

1958 
1952 
1955 

1958 
1952 
1956 
1960 
1961 



1960 

1960 

1950 

1960 

1961 

1953 
1958 

1957 



Test or criterion 
variable 



Full Range Picture 
Vocabulary Test. 

Arthur Point Scale 
of Performance 
Tests . 

Arthur Point Scale 
of Performance 
Tests. 

Arthur Point Scale 
of Performance 
Tests. 

Visual Motor Ge- 
stalt Test. 



Bernreuter-Wlnpenny 



California Achieve- 
ment Tests . 

California Tc;st of 
Mental Maturity. 

California Te.it of 
Mental Maturi.ty 

Language 

Non- language"--- - 

Total 



California Test of 
Mental Maturity. 

California Test ov 
Mental Maturity. 

Columbia Mental 
Maturity Scale. 

Columbia Mental 
Maturity Scale. 

Gates Advanced 
Primary Reading 
Tests . 
Word Recognition-' 
Paragraph Reading' 
Composite Reading' 

Goodenough Inf 'dili- 
gence Test. 

Goodenoupjh Intelli- 
gence Test. 

Goodenough Intelli- 
gence Test. 

Grade places^ent---^-- 



Wide Range 
Achievement Test. 



Kent EGY Test- 



Lelter interna- 
tional Perform- 
ance Scale. 

Lelter Interna- 
tional Perform- 
ance Scale. 



Subjects* 



No rma Is----------- 

Normals- ------- --- 

Norraals ------- - - - - 

Normals 



Nonorganic child 
guidance popu- 
lation. 



Normals 

Kindergart2n-- 

Grade 2 --- 

Grade 5 

Bilingual s 
(Guem), Grade 5. 

Normals, junior 
high. 

Retarded* elemen- 
ta*.'y school . 



Blllnguals 
(Guam) , Grade 5 . 

Normals 

Normals-- ------ 

.Retarded 

Normals---------- -■ 



Rfci.«i.Jed 



Child guidance 
cl Inlc . 



Normals- 



Mental defec- 
tives. 



Blllnguals 
(Guam) , Grade 5. 



Slow learners- 



Age range 



6-11 - 8-10 
10-3 - 12-11 

6-5 - 8-9 

9-7 - 12-9 

6-12 years 



5-4 - 5-8 
7-4 - 7-8 
9-7 - 12-9 

N.R. 
N.R. 
N.R. 



N.R. 
9-11 - 13-8 
9-2 - iO-1 
9-30 years 
6-4 - 8-0 



9-30 years 

6-12 years 

6 years 

10.5 - 15.8 

6-11 - 8-10 

6-15 years 
N.R. 

8-0 - 16-5 



Number 



100 
52 

49 

85 

98 



50 
50 
85 

51 
55 
100 



SI 
100 
60 
49 
105 



49 

98 

50 

62 

100 

74 
51 

50 



62 



49 
21 



51 



49 
30 



29 



43 



49 
29 



49 



Correlation 



Voc. ! PO 



N.R. 
N.R. 

N.R. 

N.R. 

N.R. 



W.R, 
N.R. 
N.R. 

N,R. 
N.R. 



N.R. 
N.R. 
N.R. 
N.R. 

N.R. 

0.45 

N.R. 



N.R. 
N.R. 
N.R. 

N.R. 

N.R. 

N.R. 

N.R. 

N.R. 

N.R. 
N.R. 



N.R. 
N.R. 

N.R. 

N.R. 



N.R- 
N.R. 
N.P. 

N.R. 
N.R. 



N.R. 

b',.R. 
N.R. 
N.R- 

N.R« 

0.47 

N.R. 



N,R. 
N.R. 
N.R. 

N.R. 

N.R. 

N.R. 

N.R. 

N.R. 

N.R. 
N.R. 



VS 



0.63 
N.R. 

"0.77 

N.R. 
-0.22 



N.R. 
N.R. 
N.R. 

0.80 
N.R. 



0.71 
0.65 
0.76 

0.66 
0.55 
0.56 
N.R. 



0.58 
0.55 
0.57 

N.R. 

0.37 

0 38 

N.R. 

0.55 

0.60 
0.73 



PS 



0.42 
0.65 

'0.8I 

:;.R. 

.0.07 



N.R. 
N.R. 
N.R. 

0.54 



0.57 
•j.67 
0.68 

0.68 
0.59 
'0.48 
N.R. 



0.A2 
0.46 
0.47 

N.R. 

0.51 

0.43 

N.R. 

0.47 

0.55 
0.78 



FS 



0,60 
0.7). 

^O.SO 

0.70 

-0.23 



0.92 
0.92 
0.97 

0.77 



0.70 
0.(^8 
0.77 

0.74 
0.75 
0.61 
0.68 



0.55 
0.56 
0.56 

0.43 

0.49 

0.47 

0.40 

0.61 

0.62 
0.63 



See footnotes at end of table. 
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Table 3. Studies reporting correlation between the WISC and other measures— Con. 



Investigator 



Alper (221)- 



Dunn and Brooks 
(234) . 

Klmbrcll (136) — 

Hlinclstcln and 
Hcrndon (137) . 

McBrearty (123)- 



uunsdoii and 
Roberts (170) . 



Brown» Hakes, and 
Malpass (233) . 

Mai pass. Brown, 
and Hakes (140). 



Barratt (138)- 
Wllson (139) — 



Martin and Wlech- 
ers (142). 

Stacey and Carle- 
ten (141). 

Hlte (112) 



Stempel (143)- 



Jones (154)- 

Stark (163)- 
Bacon (127)- 



Delattre an<^ Cole 
(128). 



1958 

1960 
1960 
1962 
1951 

1955 



1959 
1960 
1956 
1952 



Test or crlcerlcn 
variable 



1954 
1955 
1953 



Lelter Interna- 
tional Perform- 
ance Scale. 

Peabody Picture 
Vocabulary TtsL. 

Peabody Plctun? 
Vocabulary Test. 

Peabody Picture 
Vocabulary Test. 

Progressive 
Achievement 
Tests . 

Mill Hill Vocabu- 
lary Scale. 

Form A 

Form A 

Fonn B • 

Fonn B 

Raven Progressive 
Matrices. 

Raven Progressive 
Matrices . 

Raven Progressive 
Matrices . 

Raven Progressive 
Matrices. 



1962 



1954 



1952 



Coloured Progres- 
sive Matrices. 

Coloured Progres- 
sive Matrices. 

SRA Primary Mental 
Abilities Test. 



Verbal 

Perception— - 
Quantitative- 
Space 



SRA Primary Mental 
Abilities. 



Space 

Number 

Reasoning 

Perception 

Verbal 

IQ 

Teacher ratings-- 



The Drawing- 
Completion Test. 

Wechsler-Bellevue 
Intell igence 
Scale, Form I. 

Wechsler-Bellevue 
Intell igence 
Scale, Form I. 



Subjects 



Mental defec- 
tives. 



Retarded* 



Mental defec- 
tives. 

Emotionally 
disturbed. 
Normals 



Normals 
(England) . 



Retarded- 
Retarded. 
Nonnals-- 



Brltlah Columbia 
Hospitalized 
Americans 
Indians. 

Hospitalized 
whites^ 

High socioeco- 
nomic whites. 



Normals- 



Mental defec- 

f-tves . 
Normals 



Superior 
intelligence . 



Normals 
(England) . 



Normals-- 

Normals 

Normals 



Age range 



7-2 - 17-3 

N.R. 
10.5 - 15.8 
6-2 - 14.8 
10-3 - 12-ri 

5-0 - 14-11 



N.R. 
11-8 (mctfii) 
9-2 - 10-1 
5-6 - 13-0 



9-0 - 10-0 
7-5 - 15-9 
5-6 years 



8-5 - 10-4 



7- 6 - 10-5 

S years 
9 years 
10 years 

8- 4 - 9-10 
Ll-9 - 12-3 



10-5 - 15-7 



30 

56 
62 
48 
52 

1^47 

980 
967 
980 
967 
N.R. 

104 

60 

90 
30 

30 
30 

100 
150 
50 



50 



15 



22 

980 
980 

980 



26 



34 



240 

80 
80 
8G 

50 
32 



120 

iVO 
40 
40 

30 



15 



30 
967 

967 
967 

34 



40 



16 



Correlation 



N.R. 

N.R. 
N.R. 
N.R. 
N.R. 



0.83 
0.81 
0.85 
0.82 
N.R. 

N.R. 

0.56 

N.R. 



120 

40 
40 
40 

20 
16 



0.73 



N.R. 

0.36 



0.45 
0.30 
0.35 
0.39 



N.R. 

N.R. 

N.R. 

N.R. 

N.R. 

N.R. 

N.R. 

N.R. 
N.R. 
N.R. 

0.72 
0.84 



BD 



N.R. 

N.R. 
N.R. 
N.R. 
N.R. 



N.R. 
N.R. 
N.R. 
N.R. 
N.R. 

N.R. 

0.60 
N.R. 



0.74 



N.R. 
0.41 



0.38 
0.83 
0,53 
0.68 



N.R. 
N.R. 



N.R. 
0.49 



0.65 



0.40 

N.R. 
N.R. 
0.64 
0.78 



N.R. 
N.R. 
N.R. 
N.R. 
N.R. 

N.R. 

0.69 

N.R. 



N.R. 



0.86 



0.86 



PS 



0.79 

N.R. 
N.R. 
0.52 
0.50 



N.R. 
N.R. 
N.R. 
N.R. 
N.R. 

N.R. 

0.70 

N.R. 



0.83 



.0.52 
0.55 



N.R. 
N.R. 
N.R. 
N.R. 



0.34 

0.38 

0.55 

0.42 

0.40 

N.R. 

0.57 

0-48 
0.59 
0.62 



0,6^^ 



0.82 



^Designation of subjects are always whito Americans unless otherwise specified. 
^UISC scaled scores. ^Partial correlations with chronological ^)ge removed. 

^Raw scores. ^Scaled scores. 

NOTES: All correlation coefficients are Pearson Product-Moment unlf>ss otherwise specified. 

Z — Total population; M — male. F — female; Voc. — Vocabulary; RD— .Block Design; VS- 
Scale; FS — Full Scale; N.R. —not reported. 



ETA coefficient. 



•Verbal Scale; PS — Pcxformance 
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fact that the validity of the WISC must be judged 
principally in relation to the logic of Wechsler's 
approach and the adequacy of his development and 
Btmdardization of the test, a surprisingly large 
number of papers dealing with the validity of the 
WISC have used the Suinford-Binet a8aci*:terion. 
As may be expected, unless one assumes naively 
that the theoretical objections to ment£\l age scores 
involve gross discrepancies, wliich they usually 
do not, the correlations L>elween WISC Full Scale 
IQ's and Stanford-Binet IQ^s are generally high, 
in about the same range as the respective reli- 
abilities of these tests. (See table 2.) There seems 
to be little doubt that both tlie WISC and the Stan- 
ford-Binet merit their reputations as outstanding 
individual mtelligence tests. 

There tre, however, differences between the 
WISC and Stanford- Bine t in score levels. As noted 
above, the Wh'C IQ*stendtobe substantially lower 
than the corresponding Stanford- Binet IQ*s for the 
very young anc^ for the gifted (153 and 215), as 
well as for many samples reported ticross the 
normal range (119, 120, 124, 147, 148, 151, 154, 
156, !59, and 161). This problem is discussed 
below. 

The WISC has been correlated with a wide 
range of verbal and performance tests that pur- 
port to measure various aspects of intelligence. 
Correlations with the Wechsler-Bellevue, Form 
i, have been reported by BaCon (127) for a sample 
of 36 children in the age range 1 1 years 9 months 
to 12 years 3 months and by Delattre (128) for 50 
students aged 10-5 to 15-7. Their results for FS 
were 0.77 and 0. 87, respectively, while both corre- 
lated 0.86 for VS. For PS tiieir respective corre- 
lations were 0.65 and 0.82; for Voc., 0.84 and 0.55. 
Finally, for BD their results were 0.65 and 0.49. 
Variations of the magnitude indicated must be ex- 
pected for small samples from different settings. 
Dunsdon and Roberts (170) administered four 
vocabulary tests including tha WISC to 2,000 
British children and obtained intercorrelations 
exceeding 0.8 for both sexes. 

Table 3 summarizes reported correlation 
coefficients between WISC scores and other tests 
of intelligence, mental maturity, and achievement 
in school subjects, teacher ratings, and related 
criteria. For the FS IQ these arc generally quite 
high and positive, considering sample size and 
variation in sample composition and ^setting. In 



view of these variations, the specific coefticients 
are of less interest than the general trend, w'nich 
supports the validity of the WISC as a general 
measure of what Wechsler labels "the total effec- 
tive intelligence of the individual" (101, pp. 4 and 
5). 

For the purposes of a national survey, the 
robustness of the validity data over wide sampl^i 
fluctuations is very encouraging, as is revealed 
by its use on samples of varying geographic and 
ethnic characteristics, of varying abilities ranging 
from defective to gifted samples, and by its use 
v/ith special groups such as retarded readers 
(133), bilinguals (242), stutterers (198), and low 
school achievers (190). 

FACTORS AFFECTING WISC SCORES 

Both qualitative and quantitative variations in 
WISC scores have been reported by various inves- 
tigators in relation to a wide range of factors. 
Those discussed in this section are considered 
relevant to the objectives and problems of the 
Survey. Where feasible and appropriate, implica- 
tions and recommendations are noted. 

Anxiety 

Hafner, Pollie, and Wapner (132)andCarrier, 
Orton, and Malpass (205) have both reported nega- 
tive correlations between the WISC FS and the 
Children's Manifest Anxiety Scale (CMAS), indi-- 
eating that anxiety, as measured by this scale, 
tends to interfere with effective WISC perform- 
ance. Hafner and others found a significant corre- 
lation of -0.31 between CMAS and BD.The Carrier 
study observed the relationship (-0.54) over a 
range of ability but not among the exceptionally 
bright. It appears to be most marked in the sub- 
normal; Feldhusen and Klausmeier( 1.67) found the 
following mean differences in CMAS scores for 
three groups at different IQ levels: low IQ, 20.2; 
average, 14.8; and high, 12. These results are not 
entirely consistent with those ofBurns (206), how- 
ever, who found similar correlations between 
WISC Vocabulary and California Personality Test 
measures of Social Adjustment (0.55) and Personal 
Adjustment (0.45) but obtained nonsignificant co- 
efficients of 0.12 and 0.10, respectively, for Block 
Eyesign. 
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Although anxiety and adjustment may be re- 
garded generally as factors that tend to depress 
Wise (Voc. and BD) scores for some segments of 
the child population on some occasions, it would 
seem unwise to attempt any correction for these 
factors. Presumably, some valid evidence on ad- 
justment will become available from the Thematic 
Apperception Test (TAT), the School Information 
Form, and the extensive background and medical 
information being collected in the Health Exam- 
ination Survey. However, the relationships are not 
clearly enough defined for fim quantitative manip- 
ulation. One alternative is to regard fluctuations 
on these variables as a source of error which 
may possibly be crudely estimated later but is 
probably well randomized in the total sample. 
Another is to accept the error pragmatically with 
the attitude that depressed scores resulting from 
affective factors probably reflect depressed a- 
bility of the individual to function effectively. 

Sex DifferencGs 

The statement by McCandiess (103), cited 
earlier, that boys dobetter on the WISC than girls, 
is not supported by the present review. Data on 
sex differences are presented in nine studies 
(130, 146, 154, 169, 175, 192, 194, 196, and 232), 
and only one (130) reports a significant mean dif- 
ference favoring boys on FS IQ. However, none of 
them employed a sampling design encouraging 
confidence in the group comparisons. 

Some correlational differences mentioned by 
several authors do appear interesting: The cor- 
relation of WISC Full Scale IQ with Bender-Gestalt 
was negative and higher for boys (-0.34 p<0.01) 
than for girls (-0.09 ns) (130). The correlation of 
WISC Full Scale IQ with the Ammons Picture Vo- 
cabulary Test was 0.71 for boys and 0.45 for girls 



(169). The correlations of WISC FS and VS IQ's 
with the spelling subtest of the Iowa Test of Basic 
Skills were higher for boys than for girls. No data 
were reported in which sex differences favored 
girls. The absence of sex differences in studies of 
normal American (l46)andEnglish(154)children, 
deaf American (194) and English (196) children, 
and retarded American children (232) suggests 
considerable generality for the negative con- 
clusion. 

Qualitative Differences by Level 

Gallagher and Lucito (164) found a negative 
rank order between the mean scores of gifted 
and retarded children on the WISC. The tliree 
highest and three lowest subtests for five com- 
parison groups in their study are shown below. 

These results agree with others, to be discussed 
below, which indicate that Block Etesign scores 
are least affected by population variations, in 
contrast v/ith Vocabulary, which is the highest 
tesr: of the gf.fted groups and the lowest of the re- 
tarded. 

Baroff (223) described a WISC profile for a 
sample of 53 low-IQ patients with a mean FS IQ 
of 63; Block Design was highest, and Vocabulary 
ranked U out of 12. Although Fisher (225) failed 
to verify the Baroff patterning, Baroff results 
are in agreement with ehose of Gallagher and 
Lucito with respect to Vocabulary. Matthews (230) 
found that nonachievers in school tend to be higher 
on Block Design than on Vocabulary. Levinson 
(243 and 244), working witii Jewish children in 
New York, and Altus (240), with Mexican and 
Anglo-American children in California, bothfound 
that monolinguals exceeded bilinguals on Vocabu- 
lary, but that the differences on Block Design 



Group Number of Three highest subtests Three lowest subtests 
classification subjects (N) ^ 

1 Gifted 50 Siwilarities , Information, Picture Completion, Picture 

Vocabulary Arrangement, Digit Span 

2 Gifted - 43 Vocabulary, Information, Picture Completion, Picture 

Similarities Arrangement, Digit Span 

3 Average 565 Arithmetic, Digit Symbol Block Design, Information, 

Picture Arrangement Similarities 

4 Retarded 150 Object Assembly, Picture Information, VocabuUry, 

Completion, Digit Span Arithmetic 

5 Retarded 52 Object Assembly, Digit Vocabulary, Information, 

Span, Picture Completion Picture Arrangement 



erJc 



11 



were not significant. Burks and Bruce (186) 
found that poor readers score significmtly high 
on BKock Design, and Kallos,Grabow, ardGuarino 
(180) obtained a significant difference between 
Block Design and Vocabulary, favoring Block De- 
sign, for a sample of poor readers. 

Results such as these suggcjst the possibility 
of investigating a Voc.'BD ratio which may prove 
to have some diagnostic use, in conjunction with 
the Goodenough Draw- A-Man Test, the Wide Range 
Achie vement Test (WRAT), the Thematic Apper- 
ception Test, and school information, in evaluating 
various categories of subnormal and deviant per- 
formance such as those enumerated above. 

On the Vocabulary subtest, Stacey and Port- 
noy (168) also observed qualitative differences 
between a borderline group (IQ range 66-79) and 
a defective group (IQ range 50-65) in conceptual 
approaches to word definition. Defectives ex- 
ceeded borderlines significantly in the use of 
functional definitions, while the borderlines were 
significantly higher in use of descriptive defini- 
tions. Neither group used abstract concepts to 
more than a slight degree. 

Carleton and Stacey (219) made an item anal- 
ysis of the Vocabulary and Block Design subtests 
with a sample of 366 low-IQ children (mean FS 
IQ 67) and found four Voc. items and two BD items 
displaced. In view of the greater dependence on 
these two subtests in a short form than is usually 
required with the full test, consideration might 
well be given by the Survey staff to a repetition 
of this study for a substantial sample. 

Maxwell (211) observed that the WISC vari- 
ances for a sample of neurotic children were 
greater than for a normal sample, which led him 
to criticize the transformations of raw scores to 
scaled scores. This point was also made by Wilson 
(139), whose work was with Indian children. Walker 
(209), in a highly creative study, enumerated a 
lengthy list of qualitative variations of WISC re- 
sponses uiat appear to have promise for person- 
ality diagnosis. Walker's study merits further 
followup. 

Developmental Factors 

Klau&meier and Check (166) investigated a 
number of developmental correlates of the WISC. 
They reported that children with high intelligence 



quotients grow taller than those in the average or 
low range, but that weight is not significantly re- 
lated to sex o ; IQ. On strength of grip, they 
found low-IQ children weaker than those with 
average or high IQ's, the average group weaker 
than the high-IQ group, and girls weaker than 
boys. Cirls were found to have more permanent 
teeth and a higher cnrpal age than boys of the 
same age. No sex differences or IQ differences 
were found in relation to emotional adjustment. 
Girls also exceeded boys on achievement in 
relation to capacity, integration of self concept, 
and estimation of own ability. These observations 
are of interest in suggesting cross-disciplinary 
analysis of psychological and biomedical data. 

SPECIAL GROUPS 

The following discussion includes research on 
the WISC with reference to a number of special 
groups— those involving various disabilities, af- 
flictions, deviations, Rocial and ethnic character- 
istics, and other definitive attributes commonly 
recognized in the literature— for which at least 
some information has been found. Each of these 
groups involves some variables which affect 
WISC scores, and this review might properly 
have been included in the preceding section. 
However, most of the research referred to here 
was organized in terms of samples of persons in 
various categories rather than by underlying 
variables. As a result, the organization of the 
discussion follows the organization of the material 
reviewed. 

Reading Disability 

As noted earlier, Kallos and others (180) 
found thac Block Design scores were significantly 
higher than Vocabulary scores for a reading dis- 
ability sample of 37 boys aged 9 to 14 years whose 
IQ's ranged from 90 to 109. The elevation of BD 
was supported by Burks and Bruce (186). Altus 
(181), Sheldon and Carton (182), andKarlsen(185) 
published WISC profiles for retarded readers, 
based on small but similar groups. No consistent 
pattern is unequivocally shown. Robeck (183) used 
a more sophisticated method to study subtest 
patterning of problem readers on the WISC, repre- 
senting subtest scores as deviations of scaled 
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scores from the respective age-group means. By 
this method problem readers were significantly 
higher than the norms on both Block Design and 
Vocabulary (as well as on Comprehension, Simi- 
larities, and Picture Arrangement) and lower on 
Digit Span, Arithmetic, Information, and Coding. 
Rogge (187) reported no significant differences 
on Wise VS, PS, or FS IQ's between a sample of 
132 delinquents 14 to 16 years of age and a control 
sample of good readers. 

Correlations of WISC scales with reading 
tests are generally moderate, in the range of 0.3 
to 0.5 (171, 172, and 173). On the other hand, ap- 
proaches involving score patterns or profiles, 
such as discussed above, and qualitative analyses 
of responses, exemplified by the analyses of the 
understanding of the concept of opposite, by Ro- 
binowitz (108) and by Flamand (172), appear to offer 
greater promise than linear regression methods 
for the evaluation of reading disability cases. The 
latter approach does not appear feasible with only 
Voc. and BD in the battery, buc the pattern ap- 
proach, as discussedabove, merits consideration. 
In the Survey battery the WRAT is, of course, 
most directly related to estimation of reading dis- 
ability, but a Voc.-BD ratio may be a useful sup- 
plement. 

Auditory Disability 

Murphy (196) administered the WISC to an 
equally divided sample of 300 deaf boys and girls 
in English schools for the deaf. Deaf children did 
not di^er significantly from normal children on 
the Performance Scale in this study, and there was 
no meaningful relation between hearing loss and 
PS. It is of interest, though, that Block Design 
correlated 0.71 with PS in this sample. In addition, 
teacher ratings of emotional adjustment corre- 
lated 0.76 with PS, suggesting that here also, as 
in the samples evaluated in relation to the Chil- 
dren's Manifest Anxiety Scale, anxiety may be 
a deterrent to effective performance. 

Graham and Shapiro (195) compared the per- 
formance of the deaf and normal children on the 
WISC with standard and pantomime instructions. 
Both groups did equally well on PS with pantomime 
instructions, but the normals were superior with 
standard instructions. Mean scores on BD were 
approximately equal under all three conditions. 



For deaf children, then, the pantomime instruc- 
tions are appropriate on BD. 

Glowatsky (194) found that WISC Performance 
Scale IQ's were comparable with Draw-A-Man 
Test IQ's for a sample of 24 deaf and hard-of- 
hearing children in Santa Fe. PS scores were sub- 
stantially higher than VS scores in thJs group, but 
bilingualism (noted in 13 cases) was not a factor. 

Thompson gave Wepman's Auditory Discrim- 
ination Test, the WISC, and other tests oi reading 
and auditory acuity to 105 children, including good 
and poor readers. She found that a significant and 
substantial proportion of first graders (71 percent) 
had inadequate auditory discrimination, but chat 
this number was reduced to 24 percent by the 
second grade. Auditory Discrimination scores 
correlated more highly with reading (0.59. to 0.66) 
than with WISC IQ*s (0.55 to 0.58). The correlation 
of Auditory Discrimination with WISC Verbal 
Scale IQ, the highest correlation reported, was 
0.61. 

Where hearing disability is noted byaudiom- 
eter test it would be advantageous to estimate 
intelligence level by a combination of Draw-A- 
Man and Block Design scores. 

Visualty Handicappec! 

According to a study by Schoil (197), the 
Block Design test may be administered with 
normal procedures to the partially blind. For ths 
totally blind only the Vocabulary test would be 
appropriate in the Survey, and no data are avail- 
able to evaluate their scores adequately. 

Stutterers 

Post (198) found no significant differences 
between the mean scores of 30 stutterers and 30 
controls, predominantly boys in the age range of 
5-5 to 15-10, on the Stanford- Binet (S-B) and the 
WISC. The correlation of WISC Full Scale IQwith 
the S-B was 0.78 for the stutterers. The only 
difference found between the two groups was in 
the correlation of WISC Verbal Scale and Perform- 
ance Scale IQ's, which was 0.26 for the stutterers 
and 0.60 (the same as in Wechsler's standardiza- 
tion sample) for the controls. Both group means 
were higher on PS than VS. 
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Cerabrol Palty 

Bortner and Birch (199) studied the adminis- 
tration of the Block Design subtests with twenty- 
eight 13-year-old cerebral palsied children. They 
foand, as may be expected, that tlie ability to dis- 
criminate block designs in a choice situation may 
be intact even though motor factors impair re- 
productive ability. 

Organic Impairment of 
Central Nervous System 

Beck and Lam (200) found :hat WISC Full 
Scale IQ*s of diagnosed organics were lower than 
those jf nonorganics, but failed, as others have, to 
verify Wechsler's subtest diagnostic pattern for 
organics. Young and Pitts (202) compared the 
WISC scores of 40 rural juvenile congenital 
syphilitics (aged 6 to 16 years) with 40 normal 
controls matched on age, sex, race, region, and 
father's occupation. The controls were signifi- 
cantly superior on IQ's and on Vocabulary, but 
not on Block Design, where the critical ratio was 
marginal. 

Gifted 

In Edmonton, Chalijiers (213) administered 
the WISC to 57 superior children with IQ's above 
120 (mean FS IQ 128) and found that 11 obtained 
perfect scores on one or more tests. However, 
there were no perfect scores on Vocabulary and 
only one on Block Design. Nevertheless, Chalmers 
questioned the adequacy of the WISC ceilings for 
precise measurement in the very high range. 
Trauba (214), with a similar sample of 71 gifted 
Kansas children, found that WISC Vocabulary has 
a correlation of 0.71 with the McCali-Crabbs 
Standard Test Lesson in Beading. Lucito and Gal- 
lagher (215) obtained a mean WISC Full Scale IQ 
of 141 for a sample of 50 children whose mean 
S-B IQ was 161. In this group the boys* scores 
were slightly higher than those of the girls. In 
agreement with Gallagher and Lucito (164), men- 
tioned earlier. Similarities, Information, and Vo- 
cabulary were the three highest tests for boys and 
girls. Object Assembly, Coding, and Picture Ar- 
rangement were lowest for boys, while Digit Span, 
Picture Arrangement, and Picture Completion 
were lowest for girls (only partially in agreement 
with Gallagher and Lucito). 



The adequacy of the WISC for precise meas- 
urement of the gifted mav be questioned, but it 
is possible that more accurate measurement may 
be obtained by use of the present short form of 
Vocabulary and Block Design than with the Full 
Scale. This is a problem, however, that will re- 
quire further attention. 

Mentally Retarded and Defective 

The research on the use of the WISC with 
retarded and defective groups is very favorable, 
in contrast with research on its use for the gifted. 
This is indicated by virtually all the studies re- 
viewed: (a) reliabilities reported— Throne and 
others (227) obtained retest reliabilities over 3 
to 4 months of 0.79 for Vocabulary and 0,82 for 
Block Design on a sample of 39 retarded boys aged 
11 to 14 years; (b) correlations of the WISC with 
other tests— Stanford-Binet (216, 217, 228, and 
229), Leiter International Performance Scale (221 
and 229), Wechsler Adult Intelligence Scale (222), 
Columbia Mental Maturity Scale (224), Goodenough 
I>raw-A-Man Test (224), Progressive Matrices 
(233), Peabody Picture VocabularyTest (234), and 
grade placement (238); (c) patterning studies, 
mentioned earlier; (d) absence of sex differences 
(232); and (e) amenability to short forms based on 
Vocabulary and Block Design, as discussed above. 
(See Research on Short Forms of the WISC.) Dif- 
ferences between WISC and Stanford-Binet IQ's 
are smaller in this range than in any other. It 
appears that estimates of retardation in the pop- 
ulation should be justified on the basis of a com- 
posite score of Voc. and BD, but the desirability 
of further research to develop a conversion table 
to the Full Scale should not be minimized. 

Bilingual 

The effect of bilingualism appears to be in the 
direction of lowering the Vocabulary scores; no 
effects liave been reported on Block Design. Altus 
(240) reported such results for Mexicans in Cali- 
fornia; Kralovich (241), for children of Slavic 
origin in New Jersey; and LevinLXin (243 and 244), 
for Jewish children in New York. Kralovich re- 
ported a correlation of 0.6i between the Verbal 
and Performance scales of the WISCfor 28mono- 
linguals and -0.04 for 28 bilinguals. Where bi- 
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lingualism is known tc exist, verbal tests may be 
expected to be invalid measures and greater re- 
liance on performance-type tests surh as Block 
Design and Draw-A-Man is indicated. 

Negro 

The Wise norms do not apply to Negro chil- 
dren, and research by Young and Bright (251), 
Caldwell (252), Blakemore (253), and Racheile 
(254), as well as others, does nothing to alter 
this fact. Negroes score lower than whites, and 
it is generally accepted that cultural experience 
and caste factors not only account for the Negro- 
white differences, bur also render comparable 
measurement by culture-fair or culture-free 
methods as difficult as other ethnic conn]f»?*r!j?on«. 
The sampling designs of the studies cited, which 
used the WISC, were not adequate to qualify them 
for any detailed comment on differences found. 

Socioeconomic Status 

Laird (250) compared children of different 
socioeconomic status (SES) on the WISC and noted, 
in common with the general trend in the literature, 
superior performance at upper levels. Estes(247 
and 248) found similar differences at grade 2 but 
not at grade 5. At both grades the WISC F*ull Scale 
IQ was more highly correlated with the Metro- 
politan Achievement Test for the higher SES sam- 
ple. 

COMPARISON OF WISC 
AND STANFORD-BINET IQ'S 

Despite the theoretical objections to the men- 
tal age concept, discussed earlier, which led to 
the adoption of the deviation IQ as a distinctive 
feature of the Wechsler scales and which set 
them apart from the venerable Stanford-Binet 
test, the relation of the WISC to the S-B has been 
a matter of great interest, as evidenced by the 
number of papers on this topic in the present re- 
view. 

The Stanford-Binet is indeed one of the giants 
among psychological tests, a veritable landmark 
in the history of psychological measurement, and 
still enjoys extensive school andclinical use, not- 



withstanding the fact that its popularity has been 
somewhat reduced by the success of the relatively 
recent V/ISC. Although the standardization of the 
WISC has been impressive and supported by so- 
phisticated conceptualization, many users have 
been relieved to find that it is highly correlated 
with the Stanford-Binet. The correlation is in fact 
so high (accounting for over BO percent of common 
variance) that one wonders about the significance 
of the theorizing which describes them so differ- 
ently. 

The impression of similarity of measurement 
results given by the correlations does not, how- 
ever, stand up when mean scores of different 
groups are compared. As noted earlier, WISC 
IQ's tend to be lov/er than Stanford-Binet IQ's at 
the lower age levels and among the gifted. These 
observations are illustrated by data extracted 
from the following 12 studies in which comparison 
means were cited: 119, 120, 124, 147, 148, 151, 
153, 156, 159, 161, 215, and 216. Tlieir results 
are epitomized briefly on the following page. 
Data from Jones* (154) British study of 240 chil- 
dren in the age range 8 to 10 years are also of 
interest. For this group the WISC means were, 
on the average, 7.2 IQ points below the S-B, the 
WISC always being administered first. 

Allowing for sampling fluctuations and errors 
of measurement in routine testing, there never- 
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Normal (White) Samples 



Scliachter and Apgar 
(147)^ 



Triggs and Cartee (148) 



Muhr (119) 



Pastovic and Guthrie 
(120) 



Rottersman (151) 



Cohen and Collier (124) 



Wagner (156) 



Frandsen and Higginson 
(159) 



Kardos (161) 



Beeman (153) 



Mean age 4-1 
Mean age 8-3 
N 113 (6lTn, 62f) 

Kindergarten- 
Age 5 
N 48 

5-year group 
N 21 



6-year group 
N 21 



5-year group 
N 50 



7-year group 
N 50 



6 -year group 
N 50 



6- to 9-year group 
Ages 6-5 to 8-9 
N 53 

8- to 9-year group 

N 50 



9-year group 
N 50 



13- to 14-year group 
N 100 



Gifted (White) Samples 



Mean S-B 
Mean WISC 



Mean S-B 
Mean WISC 



Mean S-B 
Mean WISC 



Mean S-B 
Mean WISC 



Mean S-B 
Mean WISC 



Mean S-B 
Mean WISC 



Mean S-B 
Mean WISC 



Mean S-B 
Mean WISC 



Mean S-B 
Mean WISC 



Mean S-B 
Mean WISC 



Mean S-B 
Mean WISC 



N 36 Full sample: Mean WISC compared with Mean S-B: 
IQ over 130: Mean WISC compared v/ith Mean S-B: 
IQ 120-129: Mean WISC compared with Mean S-B: 



104.3 
9 8.9 

124.1 
137.6 
^l5t3 

97.4 
88,1 

102,2 
96>6 

113«0 
103.2 

115.1 
111.5 

110.2 
101.5 

104.8 
99.8 

104.5 
103.3 

105.8 

1024 

113.7 

109.4 



-15 
-20 



Lucito and Gallaghei: 
(215) 



Nale (216) 



N 50 



Retarded Samples 

9- to 11-year group 
N 104 



Mean S-B 160.8 
Mean WISC 141.2 



Mean S-B 55.4 
Mean WISC 58,0 
+ZT? 



^Inter*/al between S-B and WISC administration, 50 months. 
NOTE: N— number; m— male; f— female. 
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thele^s appears to be a common trend in these 
reports which can be summarized as follows. The 
differences between WISC and S-B IQ's are great- 
est among the gifted. In the normal range they are 
high among the very young, dropping off as age 
increases, but persisting tosome degree through- 
out the age range 5 to 14 years. Tlie data suggest 
an upturn after age 9, but this is not certain. No 
significant differences appear for the subnormal. 
The schematic chart in figure 1 suggests the na- 
ture of the age- and level-related difference 
functions on the basis of the results cited. 

Unfortunately it is possible only to speculate 
on the nature of the true curves which those in 
figure 1 are intended to suggest, and speculation 
on what they would be for a short form composed 
only of Vocabulary and Block I^esign is difficult. 
Some of the data presented earlier for these sub- 
tests suggest that the differences might be small- 
er, but in the absence of empirical evidence this 
is only an educated guess. 

For the purposes of the Survey there are 
only two alternatives. One is to carry out some 
ad hoc research on the short form, as suggested 
earlier, for the purpose of estimating the Full 
Scale IQ from Voc. and BD, using the results to 
conform to Wechsler's norms. The other is to 
regard the full Survey sample as the unprecedented 
opportunity to carry out a complete new standardi- 
zation of the short form on a basis that, in sam- 
pling sophistication, far exceeds any work of its 
kind in the history of testing. There are a number 
of problems related to the second alternative, 
including the availability of funds for this purpose. 
However, if this standardization were accom- 
plished, the new norms for Voc. and BD would be 
superior to those now available, and the compu- 
tations of FS IQ based on them would i:^rr.nif more 
accurate population estimates than any others 
covxCciVxii^ttT^-GF the age range included. 

SUMMARY AND CONCLUSIONS 

This review is based on 154 published studies, 
reviews, and unpublished theses and disserta- 
tions related to the WISC, interpreted in a frame 



of reference of measurement theory and psy- 
chometric principles. The evidence considered 
strongly supports the judgment of the Survey 
staff in the selection of the WISC Vocabulary and 
Block Design subtests as a short form of the WISC 
for the national survey, but at the samo time it 
raises questions concerning the acceptance of 
either the scaled scores of these subtests or of 
prorated Full Scale Intelligence Quotients based 
on them without further empirical research. It 
is the reviewer's considered opinion that, given 
the alternatives presented, the selection was an 
eminently wise one. The research recommended 
reflects principally the nature of the unprecedent- 
ed testing problems and the generally imprecise 
nature of psychological measurement. 

The most important rocommended investiga- 
tions discussed in this section involve the follow- 
ing steps: 

1. Restandardization of the Vocabulary and 
Block Design tests on the full Survey 
sample. As part of this study, item diffi- 
culties should be checked and a form :1a or 
set of formulas should be developed for 
estimating Full Scale IQ*s from revised 
Voc. and BD scaled scores (based on 
samples of normal, gifted, and retarded 
groups— and if possible several ethnic 
groups, such as Negroes or Mexicans— to 
whom the Full Scale has been adminis- 
tered). Consideration should be given 
to estimation of IQ*s directly from raw 
scores by age group. 

2. Research on correlates of a Voc.-BD 
ratio, for use with the WRaT and with the 
Draw -A-Man Test in the identification of 
poor readers, bilinguals, and verbally im- 
paired children and in estimating IQ*s of 
culturally de/iant ethnic groups. 

3. Cross-disciplinary developmental anal- 
yses of Vocabulary, Block Etesign, and de- 
rived scores and of item responses with 
biomedical data obtained in other secticns 
of the Survey. This area is discussed in 
detail elsewhere. See Klausmeier and 
Check (166). 
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II. THE WIDE RANGE ACHIEVEMENT TEST, 
THE ORAL READING AND ARITHMETIC SUBTESTS 



The requirement ot the Survey for an indi- 
vidually administered, brief, well-standardized, 
reliable, valid, and flexible schcK>l achieve.'nent 
test was filled by the selection of the Reading 
and Arithmetic subtests of the 1963 revision of 
the Wide Range Achievement Test. The 1963 
WRAT, by J.F. Jastak, replaces the original 1946 
edition by JasCak and S.W. Bijou and "appears to 
be quite similar to the original in design and item 
content, except that the new edition is divided, for 
the convenience of users, into two levels (Level 1 
covers ages 5 to 12 years; Level II, 12 years 
through adulthood), in contrast with the broad 
sweep of the original, from kindergarten through 
adulthood. 

The principal difference ber^veen the two edi- 
tions appears to be in the method of standardi- 
zation. The 19-56 norms were computed to conform 
to those of the New Stanford Achievement Test 
(Reading, to Ne^^ Stanford Word and Paragraph 
Reading, and Arithmetic Computation, to New 
Stanford Arithmetic Computation), whereas the 
1963 norms, in each age bracket, depend on 
"probability samplings based on IQ's . , . that 
would correspond to the achievement of mentally 
average groups with representative dispersions 
of scores above and below the mean" (301). 

The purpose of this section is both to review 
the literature on the WRAT and to evaluate it in 
relation to its suftabjlity for the objectives of the 
Survey. Unfortunately this must be done almost 
entirely cn The basis of the tests, manuals, and 
research available on the 1946 edition, which is 



itself extremely limited. Appropriate data for 
critical evaluation of the 1963 edition are almost 
totally lacking. Although released for sale in 1963, 
the test manual for this edition was still incom- 
plete in June 1964 (301 ),and no ndependent data 
on validity have been found. 

EVALUATIVE CRITERIA 

Measurement experts believe that in addi- 
tion to the standard questions concerning such 
issues as reliability, validity, representativeness 
of standardization sample, and agreement of 
norms with criterion levels, some problems are 
inherent in the ' Ide-range type of design. These 
are stated forthrightly by Ciiauncey and Dobbin 
(310). in a discussion of various defects of tests: 

The "wide-range" test ... is tlie too-short 
test in disguise. There are only a few of them 
around. They are promoted as being suitable 
measures of ability (or achievement) for 
people of many ages — from third grade 
through second year of college, for example. 
Since only a small part of any such test can be 
material suitable in difficulty for one indi- 
vidual, the effective part of the test may 
amount to no more than half a dozen ques- 
tions — making it a very short test, indeed. 

These remarks, by the president and one of 
the project directors of the Educational Testing 
Service, in a book written expressly to defend 
educational testing at a time when it is under 
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attack froni many sources, command attention 
and concern by users of wide-range tests such 
as the WRAT. The particular implication of the 
critique is that reliabilities, validities, and score 
levels must be evaluated at every level covered 
(or at least at iwery level at which the test is 
used) and that broad-band coefficients of relia- 
bility and concurrent validity are likely to be 
misleading. 

The problem of selecting a suitable achieve- 
ment test for the Svirvey is Iiighly complex. Time 
restrictions favor short forms and short-cut 
methods (such as the wide-range approach), pro- 
vided that they meet reasonable standards of 
acceptability. However, it is just as true in test- 
ing as in all other areas that "you cannot get 
more out xhiin you put in." Compromises with 
reality in testing often mean less reliable meas- 
ures and less adequate coverage of appropriate 
univei ses of content; sometimes they mean penal- 
ties in relation to validity and consequent gener- 
alizabUity of measures. 

The application of these points to the WRAT 
is considered as Judicially as possible in this re- 
view, and the reality demands are weighed against 
possible shortcomMgs of this wide-range test in 
relation to altemailves available in the situation. 
A brief review of tl^e 1946 edition and the general 
ccmceptualization of the WRAT is followed by a 
review of the 1963 edition used in Cycle 11. 

1946 EDITION OF WRAT 

The conceptualization and rationale of this 
test (302) could not help but appeal tc clinical psy- 
chologists in schools and mental health services. 
Jastak made an extremely strong case for the 
clinical use of his test, and it is not surprising 
that the WRAT has enjoyed considerable popu- 
larity in clinical circles despite psychometri- 
cians' prejudice against wide-range tests. 

Jastak's arguments are briefly as follows: 

1. A thorough psychological examination 
should include tests of school fumlamen- 
tals as well as intelligence tests. In- 
telligence tests account for only a portion 
of the variance in school achievement, and 
failure in school and life adjustment may 
result from factors other than low in- 
telligence. 



2. Reliable (and valid) school tests shouldbe 
used to assess discrepancies between In- 
teJlectual capacity and performance in 
basic school subjects as well as dis- 
crepancies in the organisation of learning 
abilities. Wide range discrepancies In 
school achievement are the rule rather 
than the exception, and their dificr/;cry is 
important for the understanding of per- 
sonality and school performance problems 
and for the institution of proper remedial 
programs. 

3. Clinically recognized discrepancy pat- 
terns in childrei; are illustrated by the 
tendency of neurotic and disorganized 
children to be more proficient in reading 
than in arithmetic. In addition, *'if neu- 
rotic tendencies and special reading 
handicaps occur together the child may 
function far beiow the level of his true 
capjacity in all school subjects." Of course, 
failure in readi^.^ and in arithmetic may 
also reflect unr-elated processes. 

Jasvak's criteria of a satisfactory school 
achievement test for (individual) clinical use are 
(a) low cost, (b) individual standardization, (c) 
ease asiu Kic onomy of administration, (d) suita- 
bility of contents, (e) relevance of the functions 
studied, atui (f ) comparability of results over the 
entire range of the skills in question. It is appar- 
ent that these criteria do in effect exclude such 
standard schix)l achievement batteries as the 
Stanford, Iowa, Cooperative, and other well-known 
and highly respected batteries that are designed 
for group administration within a narrow grade 
range and cover a large universe of content, 
requiring considerable time to administer and 
score. These criteria certainly appear to be 
"tailor made" for the Survey (as well as for 
clinical practice). However, in view of the test- 
ing conditions for individually selected members 
of the national sample, the question is, how well 
are they implemented In the WRAT? 

Jastak's views test content are of partic- 
ular interest. The WRAT focuses entirely on 
tliree basic schoolstudy skills— reading, spelling, 
and arithmetic— "around wJiichmost school stud- 
ies revolve." The range of the subtests for each 
is indeed wide, from kindergarten to college. 

The test content is concerned principally 
with mastery of the mechanics of the subject 
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rather than with comprehension. Thus the reading 
test is in effect a test of reading as a motor 
skill; the spelling test focuses on words without 
sentence contexts; and the arithmetic test in- 
volves number facility with minimal dependence 
on reading. 

This emphasis is a reflection of the author^s 
conception of the WRAT as an adjunct to tests of 
intelligence and behavior adjustment. Information 
concerning the subject's ability to comprehend 
can be obtained from intelligence tests, but ac- 
curate measurement of mechanics in the basic 
tools chosen is essential because of the depend- 
ence of most other studies on them. Further, it 
is argued that correct answers can often be given 
in conventional reading, arithmetic, and other 
subject-matter achievement tests on the baRis of 
general knowledge and intellectual ability, even 
when mastery of mechanics is poor; thus, im- 
portant diagnostic cues are overlooked. 

Although the WRAT Reading and Arithmetic 
tests were reported to correlate satisfactorily 
with other achievement tests, their limitations of 
content and intended use were clearly outlined in 
the manual. 

As stated above, the 1946 edition of the WRAT 
was standardized by anchoring the WRAT norms to 
those of corresponding subtests of the New Stan- 
ford Achievement Test. The standardization 
sample consisted of the scores of 4,052 students 
for Spelling and Arithmetic (about 1,500 were 
individually tested; the rem.ainder were tested in 
groups) and 1,429 students, individually tested, 
for Reading. Reliability coefficients (retest)were 
reported as 0.95 for Reading (N==110) and 0.90 
tor Arithmetic (IJ«120). The Reading section of 
the New Stanford Achievement Test was reported 
to have correlated 0.81 with Paragraph and Word 
Reading; the Arithmetic section of the Stanford 
test correlated 0.91 with Arithmetic Computation. 

The detailed composition of the various sam- 
ples was not reported in the '946 manual, and 
the validation data were not specified by age level 
as would be required to conform with the evalua- 
tive criteria discussed above. This wac not ex- 
ceptional in 1946, however, when the professional 
demands for rigorous reporting of critical Infor- 
mation by test publishiers were less stringent 
than they are today. 

Nevertheless, despite the absence of com- 
prehensive statistical information, the WRAT be- 



came a favorite of a large number of clinicians, 
and its use was extensive in the United States 
and abroad withi*-! a short time of its publication. 
It may appear surprising that so popular a test 
generated so little research. However, it appears 
that the principal use of the test was by clinicians 
whose attirjdes toward tests are usually validated 
more by clinical experience than by statistics 
and whose opportunities and motivations to con- 
duct and publish research are generally limited. 

RESEARCH ON THE 1946 WRAT 

It is noteworthy that only seven research re- 
ports have been found dealing with the 1946 edi- 
tion and that of these seven, two were unpublished 
mimeographed papers (50v^ and 306) furnished by 
Dr. Jastak. Reliability coefficients and corre- 
lations of the WRAT with other tests, abstracted 
from these reports and the two test manuals (301 
and 302), are reported in tables 4 and 5. 

Reading 

Hopkins, Dobson, and Oldridge (304) quoted 
Sundberg (312), in a 1961 paper, to the effect that 
although the WRAT was the second most popular 
achievement test in clinics, Sundberg could not 
find a single empirical study of it. They adminis- 
tered the Reading subtest to 502 children in 
graUcJ 1 to 5 and correlated the scores with 
teacher ratings and scores on the California 
Reading Test(CRT). The correlations with teacher 
ratings were high for grades 1 to 5—0.79, 0.74, 
0.86, and 0.85, respectively. The correlations 
with the total score of the California Reading 
Test were 0.86 for grade 3 and 0.71 for grade 5. 
The mean grade placements on the WRAT, for 
the five grades in order, were 1.4, 2.4, 3.5, 4.1, 
and 4.7. 

Wagner and McCoy (303) reported correla- 
tions of the WRAT Reading subtest with the 
Sangren-Woody Silent Reading Test (grade level) 
foi two samples, one of 29 fifth graders and the 
other of 57 primary school juvenile offenders. 
The correlations were 0.78 and 0.74. In the first 
sample, the WRAT Reading correlated 0.78 with 
both teacher ratings and with rank order o^ mid- 
term grades. The correlation with the Stanford 
Reading Test, in the second sample, was 0.80. 
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Table 4. Studies reporting reliability coefficients of the WRAT 



Investigator 



Jastak and 
Bijou (302). 

Jastak (301) -< 



1946 



Subjects 



Nonnal s — 



Type of 
coefficient 



Test-retest 



Split-half 



Standard- 
ization 
popu- 
lation. 



] Forni I with 
I Forni II. 



Age range 



N.R. 



20+ years 
18-19 years 
16-17 yeai'S 

13 years 

14 years 
13 years 
12 yoai's 



11 years 

10 years 

9 years 

8 years 

7 years 

6 years 

5 years 



14-0 
13-0 
12-6 
12-0 
11-6 
11-0 
10-6 
10-0 
9-6 
9-0 



14-11 

13-11 

12-11 

12-5 

11-11 

11-5 

10-11 

10-5 

9-11 

9-5 



Subtest 
of WRAT 



Reading- 



Reading, 
Level II. 



Reading , 
Level I. 



Readlng- 



Nutn- 
bcr 



110 



200 
200 
200 
200 
200 
200 
200 



200 
200 
200 
200 
200 
200 
200 



89 
224 
180 
179 
252 
197 
214 
207 
165 

81 



Reliability 
coefficient 



0.99 
0.98 
0.99 
0.99 
0.99 
0.99 
0.99 



0.99 
0.99 
0.99 
0.99 
0.99 
0.99 
0.98 



0.88 
0.90 
0.94 
0.92 
0.91 
0.91 
0.93 
0.90 
0.91 
0.90 



Subtest 
of WRAT 



Arlthmetic-- 



Arlthmetic- 



Arithtnetlc- 



Arlthmetlc- 



Level of subjects and time Interval between tests not reported. 
NOTES; All correlation coefficients are Pearson Product-Moment unless otherwise specified. 
N.R. — Not reported. 



Nutn- 
ber 



120 



200 
200 
200 
200 
200 
200 
200 



200 
200 
200 
200 
200 
200 
200 



8? 
194 
165 
164 
225 
191 
195 
190 
160 

78 



ERIC 



Table 5. Studies reporting correlation between the WRAT and other measures 



Investigator 



Smith (?.26)- 



Hopklns, Dob son, and 
Oldrldge (304). 



Smith (126)- 



1961 



1962 



1961 



Lawson and Avlla <305)-" 1952 



Reger (307)- 



Wagner and McCoy (303)-- 



Jastak and BlJou (302)- 



Uagner and McCoy (303) 



Hopkins, Dobson, and 
Oldrldge (30^*). 



Smith (126)- 



1962 



N.R. 



1946 



N.R. 



1962 



1961 



Test or criterion variable 



WRAT Reading Test 



Full Range Picture Vocabulary 
Test . 



California Achicvcr.cr.t Test- 
Reading Vocabulary-------- 



Readlng Comprehenslon- 



Total Reading- 



California Test of Mental Maturity. 



Gray Standardized Oral Reading; 
Paragraphs Test. 



Metropolitan Acnievement Tests, 
Reading . 



Midterm grades- 



Stanford Achievement Test» Reading- 



Word Meaning 

Paragraph Meaning;- 



Sangren-Woody Reading 
Test. 



Stanford Reading Tests- 



Teacher rating of reading ability — 



Teacher rating of reading abillty-- 



Wechsler Intelligence Scale for 
Children. 



Verbal Score 

Perfomance Score- 
Full Score . 



Subjects" 



Normals , 
Grade 2. 



Normals - 



Grade 3- 
Grade 5- 



Grade 3- 
Grade 5- 



Grade 3- 
Grade 5- 



Normals , 
Grade 2. 



ritsit^oi. de- 
fectives . 



Retarded 
boys . 



Normals , 
Grade 5. 



Normals , 
Grades 7 
and 8. 



Normals , 
Grade 5. 

Juvenile of- 
fenders. 

Juvenile of- 
fenders. 



^lormals , 
Grade 5. 



Age raage 



6-11 - 8-10 



N.R. 
N.R. 



N.R. 
N.R. 



N.R. 
N.R. 



N.R. 



16-45 years 



9-9 - 14-6 



N.R. 



N.R. 
N.R. 



N.R. 
N.R. 
N.R. 
N.R. 



Normals - 



Grade 1- 
Grade 2- 
Grade 3- 
Grade 4- 
Grade 5- 



Normals , 
Grade 2. 



N.R. 
N.R. 
N.R. 
N.R. 
N.R. 



257 

171 
86 

171 
86 

171 
86 



30 



25 



29 



389 



389 
309 



502 

90 
106 
171 
49 
86 



M F 



51 



49 



51 



19 



51 



49 



11 



Correlation 



0.42 



0.83 
0.67 



0.84 
0.67 



0.86 
0.71 



0.47 



"0.94 



0.76 



0.78 
(rank order) 



0.84 
0.81 



0.78 
0.74 
0.80 

0.78 



0.79 
0.74 
0.86 
0.86 
0.85 



0.55 
0.47 
0.61 



See footnotes at end of table . 
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Table 5. Studies reporting correlation between the WRAT and other measures— Con. 



Investigator 



Holowlnsky (309)- 
Murphy (306) 



Holowlnsky (309) 

Reger (307) 

Jastak and BlJou (302)- 



Holowlnsky (309)- 



Murphy (306)- 



Year 



1961 



1961 



1962 



1946 



1961 



Test or criterion variable 



WRAT Arithmetic Test 
California Reading Test 



First-quarter grades- 



Grade placement- 



Metropolitan Achievement Tests, 
Arithmetic. 



Stanford Achievcnjenc Tc^ts, Arith- 
metic Computation. 



Otis Quick Scoring Mental Ability 
Tests. 



Stanford Achievement Tests, Arlth- 
tnetic, and school grades. 



Stanford Achievement Tests, Arith- 
metic, and school grades. 



SubjectB" 



Normals and 
retarded. 



Normals-- 
Grade 5- 
Grade 6- 



Normals and 
retarded. 



Retarded 
boys . 



Normals , 
Grades 7 
and 8. 



Normals , 
retarded. 



Normals- 



Grade 5- 
Grade 6- 



Normals 

Grade S •- 
Grade 6 — 



Age range 



12-17 years 



N.R. 
N.R. 

12-17 years 
9-9 - 14-6 



12-17 years 



12- 


13 


years 


N.R. 


13- 


14 


years 


N.R. 


14- 


15 


years 


N.R. 


15- 


16 


years 


N.R. 


16- 


17 


years 


N.R. 



N.R. 
N.R. 



N.R. 
N.R. 



Number 



600 

241 
135 
106 

600 
25 



600 



241 



135 
106 



241 
135 
106 



Correlation 



0.61 



0.64 
0.56 

0.31 
'0.87 
0.91 

0.30 

0.59 
0.39 
0.54 
0.02 
0.09 



0.59 
0.35 



0.75 
(Multiple r) 

0.70 
(Mulciple r) 



J^Designation cf Gubjicls» ttie always white Americans unless otherwise specified. 
Spurious correlation with age for small N. 

NOTES: All correlation coefficients are Pearson Product-Moment unless otherwise specified. 

2 — Total population; M — male; F — female; N.R. — not reported; r — correlation. 
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The report by Lawson and Avila (305) of a 
correlation of 0.94 between the WRAT Reading 
flubteet and the Gray Oral Reading Test, adminis- 
tered to a sample of retarded adults ranging 
widely in age and IQ, is probatly inflated because 
of the nature of the sample. Similarly, Reger's 
(307) sample of 25 emotionally disturbed, re- 
tarded boys (age range 9-9 to 14-6) is also quite 
a diverse population. Reger reported a correlation 
of 0.76 between the WRAT Reading subtest and 
the Metropolitan Achievement Test. 

Holowinsky (309) had an apparently well- 
designed jjample of 600, including 75 children at 
each age from 12 to 16 years. Each group was 
divided into three categories on the basis of IQ 
scores. The categories were as follows: 80-89 IQ, 
90-99 IQi and 100-109 IQ. For the total sample of 
600 children, the California Reading Test corre- 
lated 0.61 with the WRAT Arithmetic subtest. 
Students of lower intellectual ability tended to show 
better achievement in arithmetic than in reading. 
For the total sample of 600 children the WRAT 
had a correlation of 0.31 with grade placement. 

These limit-:d results tend to support the 
claims for the WRAT with regard to concurrent 
validity both with other reading tests and with 
grade placement. The evidence is far from suf- 
ficient to permit definitive evaluation, and the lack 
of information on many points is obvious. However, 
no contrary evidence was found and as far as these 
papers are cc»icemed, the report for the WRAT 
Reading subtest is favorable. 

Arithmetic 

The most adequate independent study of the 
WRAT Arithmetic subtest is that of Murphy (306), 
who tested 135 fifth and sixth graders (with 
average IQ of 114) with the WRAT and the Stan- 
ford Achievement Test (SAT). The correlation of 
the two tests was 0.59 for grade 5 and 0.35 for 
grade 6. The correlations between Arithmetic 
grades and the WRAT were 0.64 for grade 5 and 
0.56 for grade 6. Correlaticais between the SAT 
and Arithmetic grades were 0.68 for grade 5 and 
0,59 for grade 6. In Reger's sample, noted above 
(307), the WRAT Arithmetic test had a correlation 
of 0.87 with the Metropolitan Achievement Test. 
Holcwinsky's study mentions a correlation of 0.59 
between the IQ scores of 12-year-olds and the 



WRAT Arithmetic subtest, as compared with 0.71 
for the Reading subtest. 

These results are less satisfactory than 
those for Reading in the respect that the corre- 
lations reported compare less favorably with those 
mentioned in the manual. This type of cross- 
validation is imperative and demonstrates the 
importance of independent reports to supplement 
the data provided in a test manual. To Dr. 
Jastak's credit, however, it should be noted that 
the Murphy report, in which the lower corre- 
lations appear, is an unpublished paper wliich he, 
Dr. Jastak, furnished unsolicited for this review. 
These studies are insufficient for an evaluation of 
the WRAT Arithmetic subtest, to be sure. As the 
only information a'/ailable, they leave the case for 
the Arithmetic tetit without strong independent 
support. 



1963 EDITION OF WRAT 

Two major changes appear in the 1963 edi- 
tion. One is the divisionof the test into two levels. 
Level I covers the age range of 5 to 12 years; 
Level II covers the age range 12 years through 
adulthood. It is pointed out in the mimeographed 
manual for this edition that this change not only 
has reduced the time of test administration, but 
also has increased the number of items at each 
level, thereby increasing "the already high relia- 
bility" of the test. Indeed, the test has been 
lengthened, and the reliabilities have been listed 
for samples of 200 each for ages 5 through 11 
years (Lcivel 1). For Reading, all— with the ex- 
ception of 5 yearG of age — correlate 0.99. (Age 5 
correlates 0.98.) Similarly computed reliabilities 
for Arithmetic are listed at or above 0.94, with 
the highest correlation, 0.97, occurring at 5 years 
of age . Since these cucuicients are based on corre- 
lations between two forms of the test, they are 
considered by the authors to be inflated. The text 
of the reliability section of the manual (301, p. 
47) states that the reliability coefficients are 
more likely within the range 0.90 to 0.95 with a 
mean of 0.92, At this level, they do not seem 
perceptibly higher than the reliabilities reported 
in the 1946 manual. 

The second major change is in method of 
standardization. The 1963 manual (301) describes 
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the development of norms and the normative popu- 
lation sample as follows: 

The revised WRAT was administered to 
school children and adults in a number of 
states: Delaware, Pennsylvania, New Jersey, 
Maryland, Florida, Washington, and Cali- 
fornia. No attempt was made to obtain a 
representative national sampling. Nor is. 
such a sampling considered essential fo7 
proper standardization, (italics added) 

The groups of children were selected from 
schools of known socioeconomic levels. The 
IQ's of the children were also known from 
group tests such as the Lorge-Thor .'.ke,the 
Kuhlmann- Anderson, and the California Men- 
tal Maturity Test, administered at the 
schools. Many of the cases (over 1,000) in 
the standardization group had btren given 
individual tests such as the Stanford-Binet, 
Weclisler Intelligence Scale for Children, 
and others. In each age brackets probability 
samplings based on IQ's were sUidied to de- 
velop WRAT norms that would correspond to 
the achit^vement of mentally average groups 
with representative dispersions of scores 
above and below the mean, (italics added) 

From the standpoint of the Health Exami- 
nation Survey, with particular reference to Cycle 
II (children aged 6-11 years), the first of the two 
mentioned changes is m advantage. The age 
range of Level I fits the age range of Cycle II 
perfectly, and the increased length of the test 
and more extensive reliability studies reported 
support the claim of excellent reliability. The 
second change, in standardization and norm 
development, does, however, present a potenticil 
problem which is accentuated by the absence of 
validity data. This is discussed below. 

Validity and Norms 

Although published in 1963, the validity sec- 
tion of the revised WRAT was not available for 
review until late in Jiine 1*^64. The delay was 
explained by the author of the test as occasioned 
by comparison of the WRaT "with a number of 
other tests in order to determine the meaning 
and diagnostic value of the three subtests in re- 
lation to other abilities." In addition, his letter 



disclosed that "specific methods to identify, in 
individual cases, the size of the independent and 
separate variances will have to be developed. 
Since this is somewhat of a novel and pioneering 
venture, it takes more time than routine manual 
preparation." The latter quotation is discussed 
separately below. 

The basis for the present evaluation is, then, 
a comparison of the content and structure of the 
1946 and 1963 editions ofthe WRAT, supplemented 
by the limited independent literature on the 1946 
edition, reviewed above, and the limited data on 
the 1963 edition provided in the manual furnished 
by the author. No independent studies of Che 1963 
edition were available. 

Comparison of the Two Editions 

Examination of the two booklets indicates 
close similarity in item content, forma'c, adminis- 
tration, and scoring. The Reading test for Level 
I, in the revised edition, contrins 55 words that 
were in the 1946 edition, and their rank order of 
sequential position in the two editions is about 
0.99. It is presumed that the 20 new words were 
empirically calibrated to fit into the previously 
established word order. The arithmetic items of 
the new test are of the same general tjrpe as in 
the earlier test, although the format is slightly 
different and the number of items is increa^^^d. 

In view of this similarity, it appears reason- 
able to expect that the network of correlations of 
the revised test with other measures would be 
appro^amately the same as that reported for the 
1946 edition. In fact, the correlations mi^t even 
be slightly higher as a result of the greater 
length of the revision. To the extent that con- 
current validity could be accepted for the 1946 
edition, therefore, there is no reason to doubt 
that it will be upheld with the 1963 edition. Al- 
though the data are quite inadequate, tentative 
acceptance on this point appears warranted, 
based on the authors' reputations and the state- 
ments in the manual. However, this is only part 
of the problem. 

Validafioii of 1963 Edition 

It is equally important to be able to meaning- 
fully Interpret the grade ratings, standard scores, 
and percentiles in relption to individual age and 
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grade placement and in relation to population 
parameters. In the absence of empirical infor- 
mation on this issue, nothing definite can be con- 
cluded. It is appropriate to raise some questions 
which have been generated by statements made in 
ihe 1963 manual. 

In the first place, the reviewer would take 
issue with the test autlior*s statement that a 
representative national sampling is not essential 
for proper standardization. A national sample is 
certainly necessary if national norms are to be 
promulgated. Although the 1946 edition was de- 
veloped on a restricted (as opposed to national) 
sample, its norms were presumably keyed to the 
grade norms of the New Stanford Achievement 
Test, for^fcich a more exfenflive base existed^ 
Even though regional, ethnic, and other perturbing 
effects were not known, it was at least possible 
to invoke the Stanford norms in interpreting grade 
levels. With the 1963 edition, however, no such 
anchoring process was followed. The only indi- 
cations concerning age-grade levels are, in fact, 
disquieting. 

The manual goes on to sa; that intelligence 
quotients of a number of group and individual 
teste (which are generally known to vary in level 
among themselves) were used to select samples 
in each age bracket "that would correspond to the 
achievement of metitally average groups tvith 
representcUive dispersions of scores above and 
below the mean." (italics added) It would indeed 
be remarkable if such a procedure could produce 
a standard reference sample of known character- 
istics for normative purposes. Therefore it is 
doubtful that the resulting norms coutd have de- 
pendable accuracy for individual assessment or 
for analysis of groups in tbft manner required 
for the national sample of the Health Examination 
Survey. Perhaps the test author^s current con- 
cern with comparisons with other tests, referred 
to above, reflects realization of this problem. 

Furthermore, in view of the professed clini- 
cal purposes of the WRAT, it is surprising that the 
standardization research is confined to "mentally 
average groups," and that no studies were under- 
taken of such groups as gifted pupils, students 
retarded in reading, arithmetic, and other school 
subjects, disturbed children, and subnormal chil- 
dbren. 

For the purposes of a national survey, prob- 
lems of ethnic and regional variations in test 



performance are important, as are otlier sources 
of perturbation attributable to deviations of abili- 
ty, personality, and physical and social factors. 
The absence of such data for the 1963 WRAT is 
certainly not the sole responsibility of the author- 
publisher ; ordinarily test producers do not assume 
responsibility fur all possible research of interest 
to all possible users. If a test attracts interest, 
information about it in various situations gradu- 
ally accumulates in the literature. However, in 
the present case it appears fair to say that the 
author^s confidence in his test led him to publish 
the revision before he had completed his own 
research and before research on it by any userb 
could be reported. The teot was issued without 
a formal designation of the norme-^P^^ntative'* 
and without any qualifications. 

Validity Variances 

Instead, the 1963 manual (301, p concludes 
its introductory section with the following para- 
graph: 

In addition to the three operational aspects 
(of mechanics and comprehension in relation 
to each skill test) the basic skills have sever- 
al unique validities which will be explained 
later by reference to appropriate research. 
The validity variances will not only support 
the empirical distinctness of mechanics and 
comprehension, but will provide the degrees 
to which each is important in learning to 
read, spell and figure and the impact the 
relationship betv/een them has on the total 
learning process. 

The burden of proof is on the author. The 
development of such an analytic scheme for inter- 
pretation of test scores is indeed both novel and 
ambitious and deserves all the time required to 
complete it. It seems regrettable, however, that 
die test was released before critical users could 
evaluate not only these devices, buteven the grade 
ratings, percentiles, and standard scores included 
in the manual. 

Validity Data in 1963 Manual 

The section of the manual entitled "Validity 
of the WRAT" (301, p, 51), contains a table of 
means and standard deviations of raw scores for 
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the Reading, Spelling, and Arithmetic subtests, 
which indicates considerable need for refine- 
ment of the tests in order to produce an even 
progression of scores from grede to grade. The 
difficulties are considerable at some levels (8.0 
to 8.5. 9.5 to 10.0, and 10.5 to 11.0. on the Read- 
ing test, for example), to say nothing of the fact 
that the basic difficulties reported about the 
standardization sample are not only not clarified, 
but are not even referred to in this section of the 
manual. 

Two paragraphs on the validity of the Read- 
ing test (301, p. 50) refer only to the studies 
cited above, which involve the 1946 edition of the 
WRAT. No validity data on the 1963 edition are 
presented. Similarly, data are presented (301, 
p. 52) on correlations of the WRAT with achieve- 
ment tests and on the validiry of the Arithmetio 
subtest, but these are also identified as relating 
to the 1946 edition. 

Internal consistency data cited by the author 
(301, p. 53) involve intercorrelations among the 
three WRAT subtests and not validity, despite the 
author's ac^sertion that "criteria of internal con- 
sistency, if properly interpreted, are usually 
more valid than are external criteria of com- 
parison." These data are also presented as "one 
method of cross-validation." 

Correlations of the Wide Range Achievement 
Test with the California Test of Mental Maturity 
are given (301, p. 54) for a sample of 74 children 
spanning the age range of 5 to 15 years. They 
range from 0.74 to 0.84 and may be spuriously 
high in view of the heterogeneity of the sample. 
Similarly structured comparisons with the WISC 
for 300 boys (aged 5 to 15 years) and 244 girls 
(aged 5 to IS years) are reported which indicate 
correlations as follows: 



Sex and test 


Reading 


Arithmetic 


Boys 








0.65 


0.56 




0.41 


0.41 


Girls 








0.56 


0.56 




0.39 


0.50 



Based on Jastak's ^ore-form revision (311). 



In view of thecompositionof the sample, these are 
surprisingly low. 

The manual also reports (301, p. 55) cor- 
relations of Wise Verbal Scale, Performance 
Scale, and Full Scale with the WRAT (3963). with 
samples covering narrower age ranges of 5 to 7 
years and 8 through 11 years. Tlie results here 
are the most impressive concurrent validity data 
in the manual, although they indicate correlations 
in the 0.6 to 0.7 range with intelligence rather than 
achievement criteria, for which they are intended. 

As stated several times earlier, the accuracy 
of score levels in the WRAT norms is regarded as 
a more pressing problem for empirical demon- 
stration than the concurrent validity (covariation 
with related measures) of the test. On th*s point 
the validity section of the manual is silent. 

Grade Equivalents 

The 1963 manual (301, p. 22) states that grade 
norms were derived from "the actual mean grade 
levels of the children in each grade group." Ete- 
spite variations in school grade-placement prac- 
tices over time, grade rating is characterized as 
"rather stable." The manual further asserts 
"striking comparability" of grade ratings of the 
old and the new WRAT's "through nearly all edu- 
cational levels except the upper ranges." Grade 
ratings below 14 years of age are said to be less 
arbitrary than grade ratings over 14 years of age. 
The grade scores are intended to be ccmparable to 
mental ages. 

Standard Scores 

The WHAT standard scores can be converted 
from raw scores by age group in a table provided 
in the manual. The standard score has a mean of 
100 and a standard deviation of 15 and is intended 
to be equivalent to an IQ from the WAIS, WISC, 
Stanford-Binet (Form L-M) or any of the major 
intelligence scales. Although these scales are not 
comparable themselves (as developed in some 
detail in section I ofthis report), the manual stateri 
that ''the results from the WRAT test can thus be 
directly compared with the major individual in- 
telligence scales." 

The standard score is asserted to be the 
"most precise and most meaningful score." It is 
the only score that is comparable between sub- 
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tests and that provides for uniform differences 
between scores. 

Percentiles 

Percentiles are included • "because of their 
present popularity and convenience," b.^t the 
manual appropriately downgra'ies them and dis- 
courages their use. 

SUMMARY AND CONCLUSIONS 

The foregoing review of the WRAT is neces- 
sarily incomplere because of lack of adequate 
information on which to base a technical evalua- 
tion. The test is well conceptualized and has much 
face validity, but standardization information on 
the 1946 edition was inadequate, and on the 1963 
edition it is thus far insufficient. 

Published research on the 1946 WRAT has 
been <=*xtremely limited and fails to answer most 



of the questions left unanswered by the authors' 
manual. Moreover, analysis of the available in- 
formation on the 1963 edition raises doubts about 
normative score levels. 

The selection of the WRAT over other avail- 
able school achievement tests may be defended on 
the grounds of administrative expediency and 
suitability of the material for the purposes of 
the Survey, in spite of the fact that inadequate 
data exist to support tlie author's claims cf va- 
lidity. It is possible that such data may be pro- 
duced, and every effo:'t should be made to obtain 
them. However, unless I'lese results are con- 
vincing—and reason to doubt that they will be 
has been expressed— it is recommended that 
serious consideration be given to cariying out a 
complete restandardization of the Reading and 
Arithmetic subtests on the entire national sample. 
Unless this is done, projections of estimates to 
population may be seriously in error. 
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III. THE GOODENOUGH DRAW-A-MAN TEST 



BACKGROUND AND DEVELOPMENT 

A compiehensive nititorical survey of the 
study of children's drawings appeared recently 
in an Important new book by Dale B.Harris (522). 
a former colleague of Florence Goodenough and 
apparent successor to her in the leadership role 
in the measurement of chi/dren*s intelligence by 
point scales based on drawings of the l-uman 
figure. The present review does not duplicate 
Harris* scholarly survey, but focuses more 
sj:)ecifically on the problems of the Goodenough 
Test as used in the Health Examination Survey. 

The first formal intelligence test based on 
the analysis of children's drawings was published 
by Florence Goodenough (595) in 1926, but the 
literature on this subject goes back at least to 
1885 (595. ch. 1). Some of the early papers are 
summarized in this study, but the major emphasis 
has been placed on recent critical research on the 
Draw-A-Man Test and its variants. Nevertheless, 
it is of interest that in 1893 Herrick (501) demon- 
strated the developmental significance of profile 
drawings and that in the same year Barnes (502) 
recognized that drawings are used by young chil- 
dren £ v a means of expressing their ideas. Mean- 
while, Lukens (503). in 1896. outlined many details 
of human figure drawings which were later in- 
corporated in the point-scoring systems of Good- 
enough ^595) and of Harris (522). 

The Goodenough Test is referred to in this 
discussion as the Draw-A-Man Test although the 
specific instructions in Cycle 11 of the Survey are 
to "make a picture of a person." However, the 
instructions goon to state that "when a bust picture 
has been drawn intentionally, the child is given 
another sheet of paper with the instruction 'Now 
make a picture of a whole person/ "Only one pic- 
ture is used. 

Rationale 

In this procedure emphasis is placed on the 
representation of details in the drawing to measure 
conceptual maturity. Drawing technique is mini- 
mized, and distortions potentially usable as cues 
for personality evaluation are not scored. Recent 



drawing tests focused on personality study have 
used two or more drawings. For example, Mach- 
over (596) instructs the subject to "draw a person" 
and then to draw a person of the sex opposite to 
die one previously drav^Ti, while Buck (594) uses 
drawings of a house, a tree, and a person. In 
general, the cues and signs interpreted in person- 
ality study of drawings are different from those 
employed for the measurement of intelligence. 

Point-Scoring System 

The point system developed by Goodenough 
(595) for drawings which can be recognized as 
attempts to represent the human figure— no matter 
how crude — involves the presence or absence of 
51 detailed points, which are listed as follows: 

l-4a Head, legs, arms, trunk present 

4b Length of trunk greater than breadth 
4c Shoulders definitely indicated 

5a Attachment of arms and legs 
5b Legs attached to trunk; arms attached to 
trunk at correct point 

6a Neck present 

6b Outline of neck 'Continuous with that of 
the head, of trunk, or both 

7a-c Eyes, nose, mouth present 

7d Both nose and mouth showii in two di- 

mensione; two lips shown 
7e Nostrils shown 

8a Hair shown 

8b Hair on more than circumference of head; 
nontransparent 

9a Clothing present 

9b At least two clothing items nontransparent 
9c Entire drawing free from transparencies 

of any sort; sleeves and trousers shown 
9d At least four clothing items definitely 

indicated 

9e Costume complete without incongruities 

10a Fingers present 

10b Correct number of fingers sho'-vn 

10c Detail of fingers correct 
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lOd upposition of thumb shown 
lOe Hand shown as distinct from fingers or 
arm 

11a Arm joint shown (elbow, shoulder, or 
both) 

lib Leg Joint shown (knee, hip, or both) 

12a-e Proportion: head, arms, legs, feet, two 
dimensions 

13 Heel shown 

14a-f Motor coordination 

a Lines reasonably firm and joining usually 
accurate 

15 Increased firmness of lines and increased 
accuracy of line junctions 

c Head outline free from unintentional ir- 
regularity 

d Trunk outline free from unintentional ir- 
regularity 

e Arms and legs without irregularities, 

narrowing at point of body Junction 
f Features symmetrical 

15a Lars present 

15b Ears in correct position and proportion 

16a-d Eye detail, brow, lashes, or both shown; 
pupiL shown; proportion; glance 

17a Both chin and forehead shown 
17b Projection of chin shown; chin clearly 
differentiated from lower lip 

18a-b Profile drawings 

Standardization 

In Goodenough's original research, point 
scores based on these items were equated to age 
norms from which intelligence quotients could be 
computed in the same manner as in the Stanford - 
Binet test. Data on reliability and validity were 
reported in the 1926 book (595) and also in a 
monograph (504) published the same year. Using 
a basic standardization sample of 5,627 school 
children from kindergarten to the sixth grade aged 
4 to 12 years, split-half and retest reliabilities 
were computed. A split-half reliability of 0.77 
(corrected) was found to be constant from 5 to 10 
years of age, and a retest reliability coefficient 
of 0.94 was reported for 194 first-grade children. 



Correlations with Stanford-Dinet were 0.76 for 
mental ages and 0.74 for intelligence quotients. 
The experimental work, analysis, and reporting 
which characterized this undertaking would be 
regarded as impressive today, and the critical 
reader of Goodenough's book can well appreciate 
Lewis M. Terman's description of it the fore- 
word) as "a notable accomplishment.' 

Pertptctive 

In 1950, a quarter of a century after the pub- 
lication of her book, Goodenough collaborated with 
Dale Harris in a review (510) of the extensive lit- 
erature generated by her test. This review was 
critical of many studies of graphic expression 
that lacked quantification, but it acknowledged the 
value of drawings used projectively as a source 
of diagnostic cues. Goodenough and Harris made 
special note of some writers' attempts to attribute 
discrepancies between the Draw-A-Man Test and 
the Stanford-Binet (in which Draw-A-Man IQ's 
are markedly lower) as possible diagnostic cues 
of emotional or nervous instability or of brain 
damage. They also cautioned about the use of the 
Draw-A-Man Test in cross-cultural comparisons, 
pointing out that the Draw-A-Man is not a cu/^wre- 
pree test, as many users have incorrectly as- 
eum.ed. This point is most dramatically illustrated 
by the Near Eastern study of Dennis (555). 

In the Fourth Menial Measurement Year- 
book, 1953, Stewart (514), while presenting a 
very favorable evaluation, suggested that the 
Goodenough norms might require revision due to 
social changes which have occurred since the 
original standardization. Such a revision w^s 
apparently justified, and the new Goodenough - 
Harris Drawing Tesv '552), published in 1963, 
fills an important need. This modified procedure 
consists of three drawings: a man, a woman, 
and ''yourself." Separate point scales are pro- 
vided for drawings of men and drawings of women; 
separate norms are also provided for drawings 
made by boys (men) and drawings made by girls 
(women). 

An empirical study on a sample of 195 draw- 
ings taken from the Health Examination Survey 
population, in which the Harris scoring and norms 
were compared with the original Goodenough 
scoring and norms, is reported below. This study 
supports a recommendation that the Harris revi- 
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sion be aoopted for scoring the Goodenough test in 
this Survey. 

EVALUATION vOF INTELLIGENCE 
BY HUMAN FIGURE DRAWINGS 

Effective Range 

Barnes' (502) early observation that children 
draw candidly up to about 14 years of age and 
th'en more abstractly is supported by Barrihart 
(507), who described three types of drawings — 
schematic (graphic representation), predominat- 
ing in the age range 5 to 9 years; mixedy in the 
range 8 to 13 years; and ftStto/reoZts/tc (abstract- 
ed, esthetic, nonspecific as to factual details), 
principally in the range 10 to 16 years c This 
apparently explains why the point scores cannot 
be validly extended above 14 years of age (522), 

The increase in point scores with age, up to 
14 years of age, apparently reflects mental matur- 
ity and not chronological age. This was noted by 
Smith (506) and by McElwee (524), who reported 
a correlation of 0.72 between the Draw-A-Man 
and the Stanford-Binet me.r.tal ages for a sample 
of 45 snbnormal 14-year-old children. Israelite 
(562) found a correlation of 0.71 between the 
Draw-A-?vlan and the Stanford- Binet for256men- 
tal defectives. Others have also successfully 
tested mentally defective adults with the Draw-A- 
Man Test. 

Relation to Artiftic Ability 

An area of special interest in the interpreta- 
tion of children's drawings has been the reLuioii 
of drawing "maturity," as reflected in point score, 
and artistic ability. Goodenough acknowledged that 
drawings could be influenced by special coaching 
(as can most human responses) but that ordinary 
art instruction in school has little effect on the 
Draw-A-Man score. She reported a correlation 
of 0.44 between the Draw-A-Man and teacher 
ratings of drawing ability (504). 

Perturbing Factors 

Intelligence scores based on drawings are 
relatively independent of artistic ability. However, 
there is evidence that both Internal factors, such 



as health, emotioriii, and attitudes, and external 
environmental factors affect the drawing content. 
In the present review, stijdles have been found 
which demonstrate the Influence on drawings of 
factors such as height and weight (543), sex and 
body Image (512, 537-539, and 541), physical 
handicaps (571 and 572), mental age (521), affec- 
tive states experienced and experimentally in- 
duced (529, 530, and 532), Institutionalization 
(540), teacher attitude (533), sociometrlc popu- 
larity (534), social acceptance (531), and social 
class (536). 

Although size ofdrawings appears to Increase 
with mental age over the effective range of the 
Draw-A-Man, size standards have not been incor- 
porated in any of the published point scores. In 
general, the studies referred to in the preceding 
paragraph may be viewed as minor perturbing 
influences within a homogeneous cultural frame- 
work. Variability among drawings attributable to 
perturbing factors of the t^^pes enumerated within 
the social boundaries of the American culture 
appears to have significance for the study of 
personality and social behavior, but It does not 
appear to Influence measures of intelligence de- 
rived from children's drawings in the age range 
5 to 12 years. 

Culture 

The factors which Influence children's draw- 
ings of the human figure most are those that re- 
flect the effects of a culture's customs and 
values, since these determine the way in which 
children are exposed to different representations 
of the human figure In dress, art, photographs, 
religious practices, and sex roies and attitudes. 
Hunkln (554) found the Goodenough norms Inap- 
plicable to Bantu school children, and Dennis 

(555) attributed the steady decline In mean Draw- 
A-Man IQ from 5 to 10 years of age (among 
Egyptian and Lebanese children in the Near East) 
to the Arab culture, which restricts access to 
representations of the human figure. Studies of 
the Draw-A-Man with children of various Ameri- 
can Indian tribes on reservations (558-560) have 
produced varying results which may perhaps be 
understood only in the context of their respective 
culture patterns. 

On the other hand, Anastasl and DeJesus 

(556) found eex differences In agreement with 
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Harris, discussed beicw, but found no ethnic dif- 
ferences in a comparison of Draw-A-Man scores 
of 50 Puerto Rican children of low socioeconomic 
class in New York City with those of Negro and 
white children of similar status which were re- 
ported by other investigators. Similarly. Levinson 
(243) found that the Draw-A-Man, aswellas WlSC 
Block Design, is culturally "fair" for native-bom 
Jewish bilingual children in New York City. 

The importance of taking into account cultural 
variations when dealing with a heterogeneous pop- 
ulation such as that sampled by the Health Exami- 
nation Survey is illustrated by the following quota- 
tions from Harris (522. pp. 131 and 132). These 
quotations have been exerpted to illustrate how the 
customar>' dress of Eskimo children affects point 
scores on drawings of the human figure. 

Eskimo children are less likely to depict the 
neck, the ears, and to correctly place the 
ears. These facts seem to reflect the greater 
prevalence of parkas in the Eskimo group's 
drawings and [this] is thus an artifact of the 
drawing situation. Due to the voluminous 
parka garmtmts. elbow joints, knee jioints and 
modeling of the hips are less likely [to be] 
shown, resulting in greater stiffness of fig- 
ures portrayed. 

Since the Eskimo boot does not have a heel. 
Eskimo children are less likely to indicate 
heels in their drawings. [Several instanres]. 
however, show that when the garb is appro- 
priate, the heel is shown. The children do 
have the concept of heels; their drawings are 
quite appropriate to the type of figure they 
are representing at the time. Eskimo chil- 
dren are also less likely to portray the arm 
and shoulder performing some type of move- 
ment, probably due to the loose parka, though 
this is not invariably the case. 

On the other liand, Eskimo ::hildren are more 
likely to portray with exactness the nostrils, 
the bridge of the nose, and, when portrayed 
at all, the thumb or fingers. The character- 
istic tendency of the Eskimo children to show 
a mittened hand earns for them a greater 
credit on the thumb opposition point and on 
the hand as distinct from fingers or arm in 
the age group ten to thirteen inclusive. In 



this ago group also the Eskimo is more 
likely to draw the arms down at the side 
than held out stiffly from tlie body. The Es- 
kimo child is more likely to show the feet 
with a wide stance, that is, with toes pointing 
apart, or in perspective in either full-face 
or profile drawings. The Eskimo drawings 
include fewer transparencies in these age 
groups, and a larger percentage of them earn 
credit for showing a distinct costume, which 
of course follows from the tendency to di aw 
the parka— the everyday costume in this part 
of Alaska. 

Aspects of the Eskimo drawings tliat are dis- 
tinctive and that are not apparent in the de- 
tailed scoring technique of the Goodenough 
method include: a greater emphasis on the 
eyebrow, on the nostrils and nose (as in- 
dicated above), and on general detail of facial 
features. There is some evidence of a general 
decrease in quality of the drawing in adoles- 
cence. This is not sufficiently great, however, 
ico reveal itself markedly in the trend of 
median scores as in the normative group. It 
is most noticeable in the increased tendency 
to draw the facial features and hands "sketch- 
ily." Particularly among young Eskimo chil- 
dren there is a very distinct tendency to draw 
shorter arms and legs than in the norm group. 
Here again there is the possibility that the 
proportions of the body are distorted some- 
what by so many children depicting the fig- 
ures in parkas. 

Cultural factors influence drawings in many 
obvious ways such as type of garb, vehicles, im- 
plements, and actions portrayed, but the nature 
of the influence on a Goodenough -type point score 
is subtle, as illustrated in the preceding quota- 
tions from Harris. Because such variations are 
often inconsequential within the mainstream of 
American culture, there has been a wide tempta- 
tion to use the Draw-A-Man as a culture-free 
intelligence test. Nevertheless, as Harris prop- 
erly insisted (522. p. l3o, "the data . . . suggest 
that the child's drawing of certain body features 
or parts is influenced by garb, and possibly by 
other conditions of living that call attention to 
particular parts or their functions. Allowance 
would have to be made, both in scoring arJ, in 
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th€ norms, for parts omitted in one of these 
cultures included in the present scoring system. 
Such allowance would have to be worked out em- 
pirically within each culture group/' (italics 
added) 

Goodenough and Harris (510). In their 1950 
review, affirmed that although the test may be 
unsulted to comparing chilldren across cultures, 
it may still ra^-ik children m^hin a culture accord- 
ing to relative intellectual maturity. In his 1963 
publication (522, p. 133) Harris has further amend- 
ed this position to state that "for the most valid 
results, the points of the scale should be re- 
standardized for every group having a distinctly 
different pattern of dress, mode of living, and 
quality or level of academic educetlon." In Harris' 
Judgment, "This conclusion virtually rules out the 
scale for cross-cultural comparisons; indeed, 
psychologists increasingly believe that mean dif- 
ferences among large, representative samples 
drawn from varying cultures express the gro.i<s 
differences in conceprual experience and training 
these groups have had. Further work, to determine 
exactly v/hich aspects of intellectual or concx^ptual 
maturity the drawing task expresses, will be 
necessary to explain scientifically these observed 
cultural differences." 

No systematic research such as Harris de- 
lineated with respect to Eskimo children has been 
done on the detailed effects of microvariaticns 
within the American culture. Yet there is little 
reason to doubt that subtle differences between 
urban and rural, industrial and suburban, warm 
climate and cold, eastern and western, and other 
prominent contrasting situatiORa within the con- 
tinental United States (to say nothing of Alaska 
and Hawaii) might produce some significant 
variations. Undoubtedly, some of these subcul- 
tural variations reflect ethnic factors, such as 
the superstitious reluctance of some southwestern 
children of Mexican origin to draw eyes because 
of fear of the "evil eye." 

It is also possible tliat secular trends, which 
are revealed in the comparison of the 1926 and 
1963 norms, may be occurring at differential 
rates in different localities and segments of the 
culture and that these also may subtly affect 
point scores. For example, the high-fashion 
announcements of transparent garments for fe- 
males not only aroused different reactions among 



different segments of the population but also re- 
ceived widely varying prominence in different 
localities. Although thiis is an extreme example, 
it is nevertheless possible that some children 
might draw the female figure appropriately re- 
flecting a sophisticated transparent garment and 
be penalized on the point srore for what could be 
considered a "bright" response. 

Sex Differences 

Both Goodenough (504) and Harris (522) have 
reported qualitative and quantitative differences 
in drawings v/hich are related to the sex of the 
person doing the drawing. Harris* more recent 
work is of greater relevance. He believes that 
diese :^ex differences cannot be attributed to dif- 
ferential selection of boys and girls according 
to Intellect. Harris* recent data show that sex 
differences in total point ecores appear at an 
v^arly age and are considerably greater than those 
reported by Goodenough. Harris found that for the 
drawing of a man, the mean score difference favors 
girls by about one-half year of growth at each year 
of age, while for the drawing of a woman, this 
difference is roughly equal to a full year of growth. 
The Harris point Rcale. applied differentially to 
Man and Woman drawings by boys and by girls, 
appears to reduce mean differences. 

Sex differences in drawing point scores re- 
flect differences inmaturation, cultural factors- 
including sex role and awareness— and perhaps 
some degree of difference in drawing proficiency. 
However, it is believed that these will be mini- 
mized by the adoption of the Harris norms and 
scoring system and l^-at the remaining residual 
error probably will be Inconsequential. Without 
doubt, the error will be smaller than that which 
would result from tiie blanket use of one uniform 
scoring system for the entire population. 

PERSONALITY STUDY 
BY CHILDREN'S DRAWINGS 

Although personality evaluation is not the 
primary reason for including the Draw-A-Man 
Test in the Survey, a review of the potentialities 
for such analysis le relevant. Since this topic has 
been covered more extensively by Harris In his 
recent publication than in this review, the following 
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discussion is organized in relation to Harris* 
summary. Below are ei^t widely accepted but not 
necessarily established generalizations concern- 
ing personality measurement by children's draw- 
ings. These were evaluated by Harris in his recent 
book (522, p. 52). As will be noted, several of the 
generalizations are rejected. 

1. Drawing interpretation is more valid when 
based on a series of a subject's protocols 
than when based on one drawing. Oespite 
the lack of clear-cut empirical evidence 
on this issue, Harris equates additional 
pictures as having the effect of increasing 
the length and therefore the reliability of 
the test. From this logical viewpoint, he 
considers it justified. 

2. Drawings arc most useful for psychologic- 
cul analysis when teamed unth other avail- 
able information about the child. This, too, 
is a logically sound principle, "especially 
when it is the content of drawings alone 
that is being used for psychological in- 
terpretation." 

3. Free drawings are more meaningful psy- 
chologically than drawings of assigned 
topics. This is probably true for certain 
purposes, such as exploration of interests, 
but systematic comparison of individuals, 
as in a national survey, requires control 
of the task. 

4. When a human figure drawing is assigned, 
the sex of the figure first drawn relates 
to the image the drawer holds of his own 
sex role. Of the studies summarized in 
Appendix 111, those most relevant to tt^e 
study of children ages 6 to 12 years are 
as follows: 512, 537-539, 541, and 542. 
According to Brown and Tolor (541), nor- 
mal individuals of both sexes tend to draw 
their own sex first, while persons with 
behavior disorders draw the opposite sex 
first. Harris agrees that most children of 
either sex will draw their own sex first 
when asked to "draw a person." He further 
elaborates that as girls grow older there 
is an increasing tendency for them to draw 
a male figure. This, he feels, reflects both 
the cultural preference given to the male 
role and an increasing dissatisfaction with 
the female role. 



Harris also hypotfiesizes that the male 
figure is more culturally stereotyped and 
easier to draw than Is the female figure. 
He considers deviates from this norm to 
be psychologically different from non- 
deviates. He also feels that the deviation 
has different meanings for the two sexes 
and has unique, idiosyncratic meanings 
to individuals. Since many deviations from 
the norm occur and since the meaning of 
such deviations is as yet unknown, it is 
unlikely that the principle (the figure 
drawn first relates to the image the 
drawer holds of his own sex role) is uni- 
versally valid. Therefore, even though 
about 86 percent of boys and 65 percent 
of girls have been reported to draw their 
own sex first, it !s not possible to for- 
mulate any reliable interpretation for 
those who do not. 

5. A child adopts a schema or style ofdraw^ 
ir^ which is peculiar to him and which be ^ 
comes highly significant psychologically. 
Most of the f?vidence is opposed to this and 
suggests rather that developmental pat- 
terns do exist among children's drawings. 

6. The manner in whi^c h certain elements are 
portrayed rin drawings may be used as 
signs of certain psychological states or 
conditions in the ar/ts/. Inagreem-entwith 
Harris, the present writer regards this 
statement as one of the eternal, unful- 
filled v/ishful myths of the "depth psychol- 
ogist." Two particular statements by 
Harris are relevant to possible further 
research in this frustrating area. First, 
"whether or not 'signs' are selected by an 
empirical or deductive procedure, there 
is still the question whether form or con- 
tent will provide the cues. Size, quality 
or texture of line, degree of flngiilflrity, 
pattern or shape, and placement on the 
page are often thought to be highly signifi- 
cant avenues for 'projecting' unconscious 
motives or needs." References 512, 521, 
537, 540, 543, 564, and 566 support this 
view, Wit neither form nor content signs 
of unequivocal value have thus far been 
validated. Thus, Harris' second state- 
ment, that "useful and valid signs leading 
to dependable conclusions are, for the 
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most part, still to be ascertained/' dis- 
poses of this generalization. 
7, Drawings must be interpreted as wholes 
rather than segmentally or analytically. 
This, too, has been a strong sentimental 
favorite, but the evidence is mostly the 
other way, particularly in personality 
assessment. In fact, the history of psy- 
chometric progress has been away fronm 
global analysis toward specific analysis, 
has favored linear over curvilinear rela- 
tions, and generally has demonstrated that 
quantitative procedures are more valid, 
even if less spectacular, than those based 
on scorer Judgment. 

Harris has cited analytic studies of com- 
ponent qualities of children's drawings, 
by Martin and Damrin and by Stewart 
(522, p. 56), which suggest that "drawings 
are actually appraised in terms of a few 
general dimensions, althoagh they maybe 
rated on a number of specifically defined 
elements or qualities." Harris believes 
that these studies lend credence to the 
belief that broad, dimensional evaluations 
(rather than highly particularistic ones), 
based on such analytic results, may be 
made more readily and more reliably. H^. 
also believes that they suggest the direc- 
tion these quantitatively and factorially 
defined "global" ratings may take. "Their 
findings in relation to personality quali- 
ties, however , are not of such magnitude as 
to support the use of drawings in diagnos- 
ing individual cases." 
S. The use of color in drawings can be sig- 
nificant for studying personality. This is 
another popular clinical belief, on which 
the empirical evidence is equivocal. 

RESEARCH ON THE 
GOODENOUGH TEST 

Reliability Studies 

Table 6 summarizes the reliability coeffi- 
cients reported for the Draw-A-Man Test in the 
studies included in this review (523-528). In 
general, the reliabilities obtained by independent 
investigators have confirmed those reported by 




Goodenough. The reliability of the point scale 
holds up in the mentally retarded range (523 
and 524), and scorer agreement is high (526). 

One problem observed in interscorer com- 
parisons by the reviewer which is mentioned in 
connection wfth the Goodenough vs. the Good- 
enough-Harris comparison is tl^at while the re- 
sults of two scorers may show a very high 
correlation, there may nevertheless be a constant 
difference in score levels between them , reflecting 
individual idiosyncrasies of tlielr interpretations. 
The safest method of coping with such constant 
errors, in a survey in which a number of scorers 
may be used for different segments of the total 
sample, would be to have at least two people 
scoie every test and to use the average of the 
two for record. 

Correlations With Other Tests 

Correlations of the Draw-A-Man with the 
Stanford-Binet are summarized in table 7, and 
its correlations with other tests, in table 8. 
Similar tables appear in Harris (522, pp. 96 and 
97). With few exceptions, correlations of the 
Draw-A-Man with the Stanford-Binet (in which 
coefficients are based on IQ's) reported by other 
investigators have averaged lower than those re- 
ported by Goodenough in 1926 (504). The ex- 
ceptions found are Williams (505), Israelite 
(562), White (565), and Ellis (unpublished master's 
colloquim paper, University of Minnesota, 1953), 
whose data agree substantially with those of 
Goodenough. 

Unfortunai ly, most of the publications cited 
which involve correlations of the Draw-A-Man 
with the Stanford-Binet and a number of other 
tests are based on very small samples (rarely 
more than 100), are usually not representative 
of their respective subuni verses, and do not 
always present assurance of testing under stand- 
ard conditions. As a result, the collection of 
correlation coefficients can only be interpreted 
very generally. 

These results indicate a considerable as- 
sociation between the Draw-A-Man Test and 
general intelligence tests, such as the Stanford- 
Binet and the WISC, which measure mental 
maturity. The common variance is probably about 
50 percent. Maturationally, the original rationale 
presented by Goodenough — that drawing point 



Table 6. Studies reporting reliability coefficients of human figure drawing tests 



Investigator 



Yepsen (523)- 



Brlll (525)- 



Albee and Hamlin 
(579). 



Albee and Hamlin 
(581). 



Hlnrlchs (586)- 



Herron (532) 



McCurdy (527)--- 



Buhrer. de 
Navarro, and 
Veiasco (511) . 



Franklel (518)- 



McHugh (508) 



Goodenough 
(504). 



Year 



1929 



1935 



1949 



1950 



1935 



1957 



1947 



1951 



1957 



1945 



1926 



Test and 

scoring method 



Goodenough- 



Goodenough - 



Human Figure 
Drawing, Paired 
CoaparlBons. 



Machover- 



Gooder.wJgh- 



Goodenough- 



Goodenough- 



Goodenough- 



Goodenough and 
Franklel . 



Goodenough - 
Goodenough- 



Subjects 



Feeblcnlnded- 



Fecblemlnded- 



VA Mental 
Hygiene Clinic. 
Range<~> norma Is 
to psychotlcs . 



Neurotic, 
schizophrenic , 
normal. 



Normals- 



Normals , Grades 
3 and 4. 



Normals- 



Normals, 
Spanish- 
speaking. 



Normal s- 



Normals , pre- 
school , 



Normals- 



Age range 



9.0 - 18.2 



N.R. 



N.R. 



N.R. 



10-18 years 



113 months 
(mean) 



83.9 months 
(mean) 



7-14 years 



Number 



37 



N.R. 

71 
65 
67 



7 years 
7 years 
12 years 
12 years 

62.0 months 
(mean) 

4-12 years 



72 



81 



16 



28 



24 



15 



59 



1,936 



200 

100 
100 
100 
100 

83 
5.627 



37 



71 
65 
67 



24 



28 



59 



15 



100 

50 
50 
50 
50 



Type of coefficient 



Test-rctcst 
Administration 1-2---— 
Administration 2-3---- 
^dmlnlstratlon 1-3 



Test-retest 

Administration 1-2- 

Admlnlstratlon 2-3- 

Admlnlstratlon 1-3- 



Interjudge----- 
Spearman-Brown- 



Interjudge- 



Spllt-half, Spearman- 
Brown. 



Test-retest, group A, 

Administration 1-2 

Administration 2-3--- 
Admlnlstratlon 1-3 



Test-retest, group A 
Administration 1-2 — 
Administration 2-3 — 
Administration 1-3-- 



Tcst-retest, group B 
Administration 1-2-- 
Admlnlstratlon 2-3-- 
Admlnlstratlon 1-3-- 



Test-rptest, group B 
Administration 1-2-- 
Admlnlstratlo . 2-3 — 
Administration 1-3-- 



Test-retest- 



Intrajudge- 
Interjudge- 
Intrajudge- 
Interjudge- 



Test-retest- 



Spllt-half, Spearman- 
Brown. 

Test-retest, Grade 1 only- 



■ ■ footnotes at end of table. 
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Table 6. Studies reportliig reliability coefficients of human figure drawing tests— Con. 



Investigator 



Year 



test and 
scoring method 



Subjects* 



Age range 



Number 



Type of coefficient 



Rollablltty 
coefficient 



Wllllains (505) — 
Smith (506) 



1935 
1937 



Goodenough- 
Goodenough- 



McCarthy (526)- 



1944 



Goodenough- 



McHugh (529)- 



Stone (582)- 



1952 



1952 



Goodenough- 



Hdchover- 



Normalb- 



Normals. Grades 
3 and 4. 



Normals, 
Grade 3. 



Normals , 
Grade 6. 



3-15 yearo 



6 years 

7 years 

8 years 

9 years 

10 years 

11 years 

12 years 

13 years 

14 years 
15-16 years 

N.R. 



N.R. 



N.R. 



100 

POO 
100 
100 
100 
100 
100 
100 
100 
100 
100 
100 

386 



492 



50 



50 



Interrater- 



Test-re test- 



58 



Intrascorer 

Inters corer --- 

Test-retest 

Odd-even, Spearman- 
Brown. 



60 



Intrajudge- 
Interjudge- 



Spllt-half 
First drawlng-- 

Second drawlng- 



Test-re test 
Drawings 1 and 2, 
males 



Drawings 1 and 2, 
females — 



Drawings 1 and 2, 
total 



^Designations of subjects are always white Americans unless otherwise specified. 
Indicates condtttons preceding Draw-A-Man testing. 



Initial test 

Satisfying activity 
Frustrating activity 



Second test 

Satisfying t^ctlvlty 
Frustrating activity 



Third test 



Frustrating activity 
Satisfying activity 



0.80-0.96 



0.91 

0.91 

0.95 

0.96 

0.93 

0.95 

0.92 

0.92 

0.94 
0.84 



0.94 
0.90 
0.68 
0.89 



0.98 
0.97 



0.82 
0.76 



0.56 
0.39 
0.50 



NOTES: Unless otherwise indicated, ic Is assumed that reliability coefficients were Pearson Product -Moment and were com- 
puted from raw scores. 

2 — Total population; M— male; F— female; N.R.— not reported: IQ— intelligence quotient; MA— mental age. 
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Table 7. Studies reporting correlations between the Goodenough and Stanford-Blnot 



Investigator 



Year 



Subjects" 



Age range 



Number 



M F 



Correlations 



IQ 



MA 



McElwee (524) - 

Rohrs and Haworth (569)- 



1932 
1962 



Retarded- 
Retarded- 



14 years 



Famlllal- 
Organlc-- 



Blrch (550) 

Israelite (562) 

Johnson, Ellerd, and Lahey (592)- 
Whlte (565) - — - 



Havlghurst and Janke (5^4)- 

Fowler (531) 

Less lng,( 551) 

McHugh (549) 



Thompson and Flnley (552)- 



Goodenough (504)- 
Williams (505)--- 



1949 
1936 
1950 
19^3 

1944 
1953 
1961 
1945 

1963 

1926 
1935 



Retarded 

Feebleminded 

State hospital populatlon- 



12.57 years 
(mean) 
9,2 years 
(mean) 

10-6 - 16-3 

6-3 - 40 years 

6-9 - 17 years 



Feebleminded- 
Epileptic 

Normal 



Normals- 
Normal s- 
Normals- 
Mormala- 



Guldance clinic referrals- 



Normals 
Normals 



8-0 - 19-4 
8-0 - 19-4 
4-8 - 10-6 



10 years 

9-2 - 12-1 

8-9 years 

64 months 
(mean) 

5-9 years 

4-12 years 
3-15 years 



45 

^6 

20 
26 

68 
256 
,209 



141 
47 
47 
47 



114 
41 
23 
90 

164 

5.627 
100 



N.R. 

0.28 

N.R. 
N.R. 

0.62 
N.R. 
0.48 



0.72 



N.R. 
(Form 
L-M) 
N.R. 

N.R. 



0.69 
0.71 
N.R. 



SO 



50 



0.63 
0.52 
0.71 



0.50 
0.41 
0.51 
0.41 

0.67 

0.74 
0,65 



N.R. 
N.R. 
N.R. 



N.R. 
N.R. 
N.R. 
0.45 



N.R. 

(Form 
L-M) 



0.76 
0.80 



° Designations of subjects are always white Americans unless otherwise specified. 

NOTES: Unless otherwise indicated all correlations are Pearson Product-Moment, with the Stanf ord-Binet, Form L. 

2; —Total population; M — male; F — female; IQ— intelligence quotient; MA — mental age; N.R. — not reported. 



scores largely reflect the ability to form con- 
cepts-— is supported by the network of corre- 
lations compiled from a variety of tests and 
by studies such as that of McHugh (549), which 
analyzed Draw-A-Man items. McHugh computed 
biserial correlations of Goodenough items with 
the Stanford- Binet and reported positive corre- 
lations for 29 items; the remainder were zero or 
slightly negative. The highest correlations , which 



support the conceptual interpretation stated, were 
the following: 



Item Correlation 

2 (legs present) 0.48 

7a (eyes present) 0.47 

9a (clothing present) 0.40 

lib (leg joint shown) 0.35 

12e (proportion, two di- 
mensions) 0.54 

13 (heel shown) 0.35 
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Tabic 8. Studies reporclng cotrelaLions between the Goodenough and other measures 



Invcstlgatoi; 



Year 



Test or criterion varf '^oie 



Subjects 



Age range 



Correlacf.on 



Havlghurst, Gunther, and 
Pratt (558). 



Albee and Hamlin (579)- 



Havlghurst and Janke 
(54?) . 

Havlghurst, Gunther, and 
Pratt (558). 

Hlnrlchs (586) 

Johnson (557) • 

Boehnckc (546) 

Ansbacher (553) 



Brenner and Morse (517)- 



Havlghurst and Janke 
(544) . 

Brenner and Morse (517)- 



1946 



Ilornowskl (547)- 
Johnson (557) 



5rr;nner ana Morse (517)- 



Shlrley and Goodenough 
(575) . 

Norman and Mldklff (559)- 



Harris (548)-- 
Johnson (557) - 



Brenner and Morse (517) — 



1949 

1944 
1946 
1935 
1953 

1938 
1952 

1956 

1944 
1956 

1961 
1953 

1956 



1932 
1955 



1959 
1953 



1956 



Arthur Point Scale of Performance 
Tests (IQ). 



Clinical ratings of adjustments- 



Cornell-Coxe Performance Ability 
Scale. 

Cornell-Coxe Performance Ability 
Scale. 

Furfey Revised Scale for Measuring 
Developmental Age in Boys. 



American 
Indians . 



6-11 years 



Zuni 

Hopi- — 
Navaho- 
Sioux — 
Papago- 



VA Mental 
Hygiene 
Clinic. 
Range— nor- 
mals to psy- 
chotics . 
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42 
78 
47 
53 
74 



Nounals- 



Hoffman Bilingual Schedule- 



Normals 

Delinquents- 



Spanish 
bilinguals 
(U.S.). 



Leiter International Performance 
Scale. 

MacQuarrie Test for Mechanical 
Ability. 

Tracing--- 

Tapping -- 

Dotting 



Metropolitan Readiness Tests, 
Number Readiness (IQ). 



Revised Minnesota Paper Form 
Board Test, Form AR. 



Normals- 
Norraals- 



10 years 
6-11 years 
9-18 years 
N.R. 

S-12 years 
10 years 



114 
66 

425 
30 

257 
100 



28 



38 



NormalG 



Normals- 



Monroe Visual subtest (IQ)- 



Moray House Picture Intelligence 
Test. 

Otis Self-Admi.nistering Tests of 
Mental Ability. 



Picture Judgment of Maturity (IQ)--- 



Pintner-Cunningham Primary Mental 
Test (MA). 



Pintner Non-Language Primary 
Mental Test (IQ) . 



Normals 
(Scotland) ■ 

Spanish 
bilinguals 
(U.S.). 

Normals 



4-7 - 5-11 

10 years 
4-7 - 5-11 

N.R. 

N.R. 

4-7 - 5-11 



16 

110 
16 

N.R. 

30 

16 



Deaf- 



Progressive Matrices- 
Progressive Matrices- 
Reaction time 



Sangren Information Mental Age- 



Normals , 
American 
Indian. 

Normals 

Spanish 
bilinguals 

(u.s.y. 

Normals 



5+ years 
6-6 - 15-6 

5-1 - 6-1 
N.R. 

4-7 - 5-11 



229 
96 



98 
30 



16 



45 



53 



0.10 
0.21 
0.23 
0.33 
0.64 

0.62 
(rank order) 

0.64 
(product 
moment) 

0.63 



0.63 
0.35 
0.05 

0.83 



0.34 
0.23 
0.16 

0.55 
(rank order) 



0.48 



0.64 
(rank order) 



0.34 (M) 
0.49 (F) 

-0.02 



0.64 
(rank order) 

0.66 
(rank order) 

0.33 



0.24 (IQ) 
0.35 (MA) 



0.22 
0.43 



0.67 
(rank order) 



See footnotes at end of table* 
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Tabic 8. Studies reporting correlations between the Goodenough and other treasures— Con. 



Investigator 



Buhrer, de Navarro, and 
Velasco (511) . 



Fowler (531)- 



Shlrley and Goodcnougl'. 
(575). 



Ansbacher (553)- 



Harrls (548)- 



Brenner and Morse (517)-- 



Britton (536)- 



Hanvik (593)- 



Rohrs and Haworth (569) — 



1951 



1953 
1932 

1952 



1959 



1956 



1954 



1953 



1962 



Test or criterion variable 



School grades- 



Mathematics-* 

Language ------------ 

Language and Mathematics- 
Drawing 



Social Distance Scale (Fowler) - 



Stanford Achievement, Education 
(quotient) . 



SRA Primary Mental Abilities- 



Subjects 



Normals , 
Spanish- 
speaking. 



Normals- 
Deaf 



Normals- 



Word Vocabulary 

Picture Vocabulary 

Total Verbal Meaning- 
Space 

Word Grouping 

Figure Grouping 

Total Reasoning 

Perception 

Number 



Total Nonreadlng- 

Total Score 

S+R+P 



SRA iTlmary Mental AblJlties- 

Verbal 

Perception 

Quantitative 

Motor 

Space 



Teacher rank of school readiness 



Warner's Index of Status Charac- 
teristics. 



Wise Full Scale (IQ)- 



Wechsler Intelligence Scale for 
Children (IQ) . 



Normals- 



Nonnals 



Psychiatric 
patients . 



Retarded, 
familial and 
organic . 



Verbal Scale - 

Performance Scale- 
Full Scale 



Age range 



7-14 years 



9-2 - 12-1 
5+ years 

10 years 



5-1 - 6-1 



4-7 - 5-11 



9-11 years 



5-12 years 



N.R. 



1,936 



41 
41 

100 



98 



16 



232 



25 



46 



19 



45 



102 



23 



22 



53 



130 



23 



Correlation 



-0.04 
-0.10 
-0.01 
0.27 

0,40 



0.34 



0.23 
0.19 
0.26 
0.38 
0.28 
0.34 
0.40 
0.37 
0.24 

0.45 
0,41 
0.48 



0.50 
0.44 
0.54 
0.40 
0.51 

0.69 (rho) 



0.11 



0.18 
(rank order) 



0.28 
0.53 
0.46 



"Desit,nations of subjects are always white Americans unless otherwise specified, 

NOTES: All correlation coefficients are Pearson Product-Moment unless otherwise specified. 

I —Total populaclou; M— male; F— female; IQ— Intelligence quotient; N,R.--not reported; MA— mental age. 
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It is of Interest that a careful survey of the 
literature spanning a period of over 40 years 
fails to disclose any definitive pattern of the 
particular components of mental maturity meas- 
ured by the Goodenough test, Harris believes 
that tliis may be attributed to the fact that such 
components are themselves not clearly differ- 
entiated in young children. The correlational 
results do. howevei*, suggest strongly that the 
Draw-A-Man is more highly associated with 
factors measured by performance tests than with 
verbal abilities. 

In the Health Examination Survey, corre- 
lations of the Draw-A-Man with WISC and. more 
particularly, with the short form composed of 
WISC Vocabulary and Block Design would be most 
relevant. Table 3 includes three reports (115. 130. 
and 224) which mention correlations between the 
Draw- A -Man Test and the Full Scale IQ of the 
WISC. Of these, none mentions correlations be- 
tween the Draw-A-Man and the short form 
of the WISC. Harris* summary also cites the 
following unpublished data by Ellis. 



Age 


Number 


Correlation 
with: 


FS 


VS 


PS 


8 




16 


0.70 


0.77 


0.67 


9 




34 


0.67 


0.63 


0.59 


10 




20 


0.24 


0.17 


0.26 


11 




17 


0.50 


0.45 


0.46 


12 




19 


0.62 


0.50 


0.68 


13 




17 


0.L3 


0.05 


0.L5 



Disregarding tne 13-year-old group, since it is 
outside the effective range of the test as well 
as outside the age range of the Survey, ELlis* 
results for the total sample of 106 have an 
average correlation with the WISC Full Scale 
IQ of 0.57. Again, this is higher than the corre- 
lations reported by others. 

In summary, it appears that the WISC corre- 
lations with the Draw-A-Man Test are substantial 
bur lower than those of the Stanford- Binet. 
They are, however, higher with the Performance 



Scale than with the Verbal Scale (except in 
Ellis' two lowest grades). 

In comparing Draw-A-Man scores with WISC 
Full Scale estimates, there is no reason to assume 
any systematic differences in mean levels across 
the entire population. However, for statistical 
estimation as well as analytic purposes, it is 
most appropriate to compute the regression of 
Draw-A-Man on Voc., BD, and Total Score and 
then to work with differences between regressed 
and actual rtCcre*= for discrepancy analysis, 
rather than with differences between scaled 
scores. 

In view of the Draw-A-Man's sensitivity to 
cultural variations, cases in which there are 
large discrepancies between the Draw-A-Man 
and the WISC should be thoroughly evaluated in 
the light of the WRAT scores and other infor- 
mation from the Health Examination Survey. 
Although Harris' summary and the reports con- 
sulted in this review have suggested a number of 
promising diagnostic score patterns, none of them 
seem well enough established to be adopted. 

THE HARRIS REVISION OF THE 
GOODENOUGH TEST 

Dale Harris' 1963 publication (522). which 
he has named the Goodenough-Harris Drawing 
Test, is a thorough revision and extension of 
Goodenough's test. As already mentioned, it bases 
the lengthier point-score scales on both drawings 
of the male figure and drawings of the female 
figure, for which it provides separate norms for 
boys and for girls. A third picture, in which the 
child draws a representation of himself, has not 
been empirically standardized. 

Standardization of the Harris revision was 
completed on a total sample of 2,965 children, 
representative of four major geographic areas of 
the country. The sample was also representative 
of the 1960 census distributionof fathers' occupa- 
tions. Total point scores are converted to standard 
scores with a mean of 100 and a standard deviation 
of 15. Conceptually, these are equivalent to the 
WISC deviation IQ*s. The new scales overlap 
extensively with the original point scales, and 
Harris found that children now earn substantially 
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higher scores when the 1963 norms, rather than 
the 1926 ones , are utiUzed, The explanation for this 
phenomenon is not clear. TTie new norms do 
appear to take into account technical and social 
changes which h«ive occurred between 1926 and 
1963. They also offer the advantages of greater 
length (hence, higher reliability) and more ad- 
equate provision for sex differences. 

Comparison of Goodenough and 
Goodenough-Horris Scores 

It seems desirable to inquire whether the 
Ha,vrls scales and norms could be used to score 
human figure drawing obtained in the Health 
Examination Survey. As noted above, in this 
Survey only one picture is drawn by each child, 
who is instructed, "Make a picture of a perbon. 
Make the very best person that you can." To use 
the Harris scales in the Survey it would be 



necessary for the scorer to decide whether each 
drawing was of a "Msn" or of a "Woman.*' 

A sample of 200 drawings, 100 drawn by boys 
and the other 100 drawn by girls, was taken at 
random from the Survey files. These drawings 
were then carefully scored using Harris* norms, 
and the scores obtained were compared with the 
scores the drawings had already received on the 
J 926 Goodenough scale. (Scoring by the 1926 
method is completed in the field by Survey staiT 
psychologists.) 

Of the 200 cases, 195 were usable. TTiree 
drawings were rejected because they co.itained 
a fac'^ only, and for two cases age had been in- 
advertently omitted, precluding the computation 
of standard scores. For the remaining drawings, 
neither scorer reported any difficulty in identi- 
fying the sex represented, and theii agreement 
on this was perfect, 



Table 9. Means of Goodenough-Harris and Goodenoufeh variables and correlations bet\?een 
scoiers and between methods for total sample and six subsamples 



Variable 


Total 
group 


Draw- 
ings of 
a woman 


Draw- 
ings of 
a man 


Drawings of a 
woman 


Drawings of a 
man 


By 

boys 


By 
girls 


By 

boys 


By 

girls 


N-195 


N-94 


N-101 


N-17 


N-77 


N-83 


N-18 


1. 


Goodenough-Harris 










32.14 




29.78 




point (A) 


30.75 


31.41 


30.13 


28.12 


30.20 


2. 


Goodenough-Harris 










96o52 




97.00 




SS (A) 


96.59 


95.89 


97.24 


93.06 


97.29 


3. 


Goodenough-Harris 




















36.02 


36.62 


35.47 


34.71 


37.04 


35.54 


35.11 




GoodeuOi^gh-Harris 


















SS (B) 


105.97 


105.15 


106.73 


104.06 


105.39 


106.63 


107.22 


1+3. 


Average Goodenough- 


















Harris point (A,B)-- 


33.39 


34.02 


32.80 


31.42 


34.59 


32.87 


32.45 


2+4. 


Average Goodenough- 


















Harris SS (A,B) 


101.28 


100.52 


101.99 


98.56 


100.96 


101.96 


102.11 


5. 


Goodenough point 


26.38 


25.57 


27.14 


24.29 


25.86 


27.20 


26.83 


6. 


Subject's CA 


115.01 


111.89 


117.92 


118.33 


110.47 


118.10 


117.11 


7. 




114.61 


112.48 


116.59 


108.88 


113.27 


116.71 


116.06 


8. 




101.23 


102.27 


100.27 


92.59 


104.42 


100.10 


101.06 


^13- 




0.90 


0.89 


C.91 


G.82 


0.91 


0.90 


0.95 


^24- 




0.90 


0.88 


C.91 


0.79 


0.89 


0.92 


0.83 






0.78 


0.76 


0.81 


0.60 


0.78 


0.87 


0.47 


^48- 




0.81 


0.78 


0.84 


0.58 


0.82 


0.89 


0.48 



NOTE: N— number; A — scorer A; B — scorer B; SS— standard score; CA— chronological age; 
MA— mental age; r — coi relation. 



47 

ERIC 



The usable sample of 195 cases consisted 
of 100 Niys and 95 girls. Of these, 17 boys drev? 
a Worr.an figure and 18 girls drew a Man figure. 
The remaining 82 percent of the total group 
(83 percent of the boys and 81 percent of the 
girls) drew their own sex. 

The following eight variables were recorded 
for all 195 cases: 

1. Harris method, point score, scorer A 

2. Harris method, standard score, scorer A 

3. Harris method, point score, scorer B 

4. Harris method, standard score, scorer B 

5. Goodenough poiiii score 

6. Subjcn^i's chronological age in months 

7. Goodenough mental age 

8. Goodenough IQ 

Means, standard deviations, and intercorrelacions 
were computed for the total sample and for the 
following six subsamples: (1) Woman drawings 
(N*94), (2) Man drawings (N=101), (3) Woman 
drawings by boys (N=17), (4) Woman drawings 
by girls (N=77), (5) Man drawings by boys(N=83), 
and (6) Man drawings by girls (N^18). A summary 
of the most relevant results, for alU^even sample 
combinations, appears in table 9. 

ITie correlations between the two scorers 
(r^^ and r^^) are high despite a systematic tend- 
ency for scorer B's resultLi to exceed those of 
scorer A (they average 5.25 above scorer A on 
point score and 9.38 higher on standard score). 
As a more stable estimate of the Harris scores 
for comparison with the Goodenough, average 
mean scores for the two scorers were computed. 
These appear in table 9 between variables 4 and 5. 

Although agreement between the two scorers 
IS generally high, the lowest correlations were 
found for the V boys who elected to draw a 
female figure (subsample 3). The standard score 
correlation^^ for the 18 girls who elected to draw 
a male figure (subsample 6) are also com- 
paratively low. These opposite-sex drawings 
also reflect the lowest correlations between 
Harris and Goodenough IQ*s for both scorers 

(r^Q and r^„). Thus scorer agreement is lowest 
28 4'-> 

on opposite- sex drawings, and the results for 
these show the poorest agreement, correlation- 
wise, between the Goodenough-Harris and Good- 
enough IQ*B. It is possible that these diff'^rences 



could be eliminated byfurther training of scorers. 
Certainly these results illustrate the importance 
of quality control of scoring. The averaging pro- 
cess is also highly recommended if systematic 
scorer differences cannot be eliminated. 

The principal support, indicating an advantage 
of the Goodenough-Harris scale, appears in the 
comparison of mean scores for boys and girls on 
Woman and Man drawings as abstracted in table 
10. In accordance with Harris* own findings, girls 
score higher than boys, but the differences are 
greater on the Goodenough scale than on the Good- 
enough-Harris scales and are greater on the 
Woman drawings than on the Man drawings. lYie 
greatest discrepancy and resulting scoring pen- 
alty by the Goodenough scale occurs in the case 
of the 17 percent of boys (subsample 3) who 
elected to draw a Woman. At the same time, the 
81 percent of girls (subsample 4) who elected to 
draw their ow^n sex receivea disproportionately 
high scores on the Goodenough, in con^parison 
with the mean levels on the Goodenough-Harris. 
TTlie Goodenough-Harris scores are higher than the 
Goodenough for both sexes on the Man drawing. 

The problems with the Woman drawing clearly 
support the observation, first pointed out by 
Goodenough and strongly reiterated by Harris, 
that the female figure is more culture-bound 
than the male, is less stereotyped,, and is more 
susceptible to individual interpretation. Although 
the data on which the present analysis is based 
are limited, they do suggest that the Harris 
revision does less violence to the female figure 
than does the Goodenough scoring and that, in 
general, the Harris revision is more adequate for 
opposite-sex drawings. 

These data, which indicate a superiority of 
girls over boys in drawing scores, a tendency 
for the Goodenough-Harris scores to be higher 
than the Goodenough scores, and a tendency for 
girls who draw male figures to be older than girls 
who draw their own sex (while no such differ- 
entiation occurs among boys), are all consistent 
with trends reported elsewhere in the literature. 
However, the most important argument in favor 
of 'JGing the Goodenough -Harris scoring system 
is that thie variation of mean scores among the 
four subsamples is thereby greatly reduced around 
a mean of 100. This range is from 92,59 to 104.42 
(11.83) on the Goodenough and from 98.56 to 102.11 
(3.55) on the Goodenough-Harris. Although the 
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Table 10. Comparison of Goodenough-Harr is and Goodenough mean IQ's for boy n and girls 

on same-sex and opposite-sex drawings 



Sex 


Drawing of a woman 


Drawing of a man 


Goodenough 
IQ 


Gocdenough- 
Harris IQ 


Difference 


Goodenough 
IQ 


Goodenou gh- 
Harris IQ 


Difference 


Girls 

Difference- 


92.59 
104.42 

11.83 


98.56 
100.96 

2.40 


+5.97 
-3.46 


100.10 
101.06 

0.96 


101.96 
102.11 

0.15 


+ 1.86 
+ 1.05 







Table 11. Coefficients of variation of Harris and Goodenough IQ's for total sample and 

six subsamples 



Item 


Total 
group 


Draw- 
ings 
of a 
woman 


Draw- 
ings 
of a 
man 


Drawings of 
a woman 


Drawings of 
a man 


By 
boys 


By 
girls 


By 
boys 


By 
girls 




0.16 
0.19 


0.15 
0.18 


0.15 
0.19 


0.10 
0.14 


0.16 
0.18 


0.17 
0.21 


0.13 
0.18 



standard deviations of the Gooienough-Harris 
and Goodenough scores were not shown in table 
9, the relative variabi'.ity of scores based on the 
two systems is indicated in table 11 , which reports 

(standard deviation \ ^ 
' 1 for 
mean ' 

Goodenough-Harris standard scores and for Good- 
enough IQ's for each of the subsamples. It is 
apparent that in every case variance is lower for 
the Harris scores. 

Recommendation 

On the basis of this aniily^is it is recom- 
mended that the following steps be adopted in 
relation to the Draw-A-Man Test in the Survey: 

(1) the Goodenough-Harris system should be used; 

(2) the entire sample should be scored centrally 
by uniform standards, with adequate training of 
scorers and quality control procedures routinely 
followed; and (3) if scorer variations cannot be 
eliminated by training, the procedure of averaging 
the results of two or more scorers should be 
adopted. 



SUMMARY AND CONCLUSIONS 

The foregoing review of the !Draw- A-Man Test 
supports the view that it is a reliable and valid 
nonlanguage measure of mental maturity^ although 
highly sensitive to cultural influences on the 
child's conceptual representation of the human 
figure. Its use in a national survey in the 6 to 12 
age range, inconjunction with the WISC and W RAT. 
is logical and desirable — particularly as a means 
of assesbiiig intellectual development in cases in 
which there is impairment of verbal development 
or verbal performance. 

Personality assessment bymeans of thematic 
and qualitative assessment of children's drawings 
would probably be unrewarding. Some indications 
justifying further research have been noted; how- 
ever, such research is not sufficiently promising 
to warrant the expenditure of Survey funds. On 
the other hand, several lines of empirical work 
appear worthwhile. These are enumerated below. 

As discussed in the final portion of the review 
of the Draw-A-Man, there is strong evidence for 
the adoption of the Harris revision of the Draw- A- 
Man with central scoring by trained scorers, and 
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averaging of scores of two or more scorers, if 
scorer variations cannot be eliminated in train- 
ing. This procedure need not be regarded as 
expensive, since it could leave the field psychol- 
ogists free to test more children while the 
scoring is done centrally by lower paid workers. 

Although research on personality-assess- 
ment uses of the drawings- within the Survey pro- 
gram is not recommended, the following lines of 
empirical study and analysis are regarded as 
useful and even important: 

1. A systematic study of cultural variations 
related to the principal geographic areas 
in which Survey data vere collected to 
evaluate the effects of factors such as 
customs, attitudes, dress, art, and social 
roles in relation to the items in the point 
scaiesby which the Draw- A-Man is scored. 
Even if the results of such an analytic 
study should be negative, they would be 
very reassuring in relation to the use of 
the Draw-A-Man scores in the Survey. 



2. Regression studies of Draw-A-Man 
scores with other psychometric variables 
in the Survey so that comparisons can 
be made on the basis of differences be- 
tween regressed and actual scores rather 
than directly between raw scores. 

3. Further restandardization of the Good- 
enough-Harris norms on a national sample 
would be a valuable contribution to psycho- 
logical measurement of children that 
could only reflect credit on the Survey 
and would be of major importance for 
future use of this well-established and 
useful intelligence test. This significant 
undertaking, if approved, should include a 
complete item analysis as vyell as recom- 
putation of norms. 

Some additional suggestions regarding cross- 
disciplinary studies with reference to the Draw- A- 
Man Test are presented in a later section of this 
report. 
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IV. THE THEMATIC APPERCEPTION TEST 



The technology of personality measurement 
lags far behind that of ability and achievement 
measurement. This lag makes it difficult for 
organizations (such as the Division of Health 
Examination Statistics) which seek to estimate 
population parameters on the basis of definitive 
test scores. At present there is not a single per- 
sonality lest for children that could be recom- 
mended wi/diout qualification. In view of the 
extensive use of personality tests in clinical 
practices and in school situations, this sweeping 
statement may appear extreme. It is, neverthe- 
lesiB, regrettably true. Perhaps clinical psychol- 
ogists can justify their use of various personality 
measure^ on the basis of intensive individual case 
study in which test responses and scores are in- 
terpreted, by the clinician, in relation to con- 
sistent patterns of performance in the context of 



a votal life record. The clinician usually feels 
free to accept or disregard information in this 
frame of reference, and he often employs informal, 
unstandardized "tests" as well as published pro- 
cedures without regard for formal considerations 
of reliability and validity. Furthermore, since 
clinical judgments are confined to individual 
cases, they are not subject to verification by t;ie 
rules of evidence observed in scientific stUvHes. 
Educators often justify their personality testing 
as contributing to research, which is important, 
and the only tenable position in the light of the 
facts. 

In contrast witli the clinical and research uses 
of personality measures, where legitimacy is not 
primarily a function of the proven adequacy of the 
measurement instruments employed, surveys such 
as this one (HES) operate under severe constraints. 
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The survey scientist must defend the validity and 
reliability of his instruments as well as tlie ade- 
quacy of his sampling design for the purposes of 
his survey; both considerations affect the validity 
of population estimates from sample data. 

The choice of a personality measurement 
instrument for Cycle II must be considered in the 
context of the preceding discussion. Although the 
California Personality Test and Cattell's Junior 
Personality Quiz are, in the opinion of the writer, 
the most adequately documented of the currently 
publitJhed and objectively scored personality tests 
for children, neither meets the reliability and 
validity standards necessary for Survey use and 
neither is appropriate for the entire age range of 
6 through 11 years. Apart from these, no available 
tests even approach the requirements of this 
Survey. 

In the paychometric sense, the Thematic 
Apperception Test (TAT) is not a test. It is a 
projective device consisting of a series of am- 
biguous (unstructured) pictures individually pre- 
sented to the subject (or patient), who is asked to 
imagine and relate a story. The rationale of the 
procedure is that people will seek to create 
structure when a stimulus situation is unstruc- 
tured and that in doing so they will draw on their 
own experience, needs, attitudes, and values to 
provide the details. This process is viewed as a 
"projection" of inner processes on the un- 
structured stimulus. 

The TAT was developed by Henry A. Murray 
of Harvard University in 1938 (788). At the same 
time he presented a report which outlined a 
motivational system of organismic needs and en- 
vironmental presses. This report was highly in- 
fluential and stimulated much research. Five 
years later (in 1943), the TAT pictures and a 
manual for their use were published (799). 

From the objective scoring standpoint, it is 
necessary to recognize that all projective methods 
share a major problem , since in all of them the 
testing strategy depends on the process by which 
subjects add structure to ambiguous stimuli. 
Although this structuring process does involve 
projection, in the sense defined above, it also 
simultaneously involves other factors. Indeed, 
the structuring process may be as much a 
function of external, situational factors, to which 
the subject is responding, as of internal factors. 



How these various factors combine are only 
imperfectly understood in the scientific study 
of perception; they have not, to the writer's 
knowledge, been investigated in relation to the 
TAT pictures. In spite of these facts, for the past 
60 or more years users of projective techniques 
have continued to assume that responses to 
various stimuli represent projection only. 

Cattell (796) has suggested that "projective" 
tests (which he thinks should be called "misper- 
ception tests"), should employ stimuli of a much 
lower order of complexity than those of the TAT 
and the Rorschach inkblots in order to simplify 
interpretation. Technically this may be an im- 
provement, as Cattell has shown in the misper- 
ception tests which he designed for his objective 
test batteries. In these tests the subject's latitude 
of response tc a specific ambiguity (e.g., esti- 
mating the number of communist party members 
in the United States or the value of a college 
degree) is extremely limited. A similar con- 
clusion is also implicit in the modifications of 
the TAT pictures made by McClelland (798) in 
his studies of motivation measurement in fantasy. 

In a complex projective technique such as 
the TAT, the story produced by a subject may 
represent his response to the entire picture or 
only to certain parts of the stimulus picture. In 
addition, the story itself necessarily requires 
technical interpretation by the examiner to the 
extent that it employs idiosyncratic language, 
symbols, and ideation. Because of the freedom 
and informality of the method, which is deliberate 
(in order to avoid prompting or the addition oi 
extraneous variance contributed by the examiner), 
it is virtually impossible to relate responses to 
specific internal and external cues or patterns of 
cues. 

The very looseness of the interpretative 
procedure, in contrast to fixed scoring keys in 
the case of questionnaires (usually answered 
"yes," "no," or "?"), led George Kelly (797), in 
an Annual Review article, to observe that while 
in the case of questionnaires the subject tries to 
guess what the examiner is thinking, in projective 
techniques the examiner must guess what the 
subject is thinking. In either case, there is a good 
deal of guessing going on. 

The TAT has some similarity to the Draw- 
A-Man Test in that the E)raw-A-Man provides an 



ERIC 



unstructured stimulus (the instruction to draw a 
person) and permits wide latitude of response 
structuring on the part of the subject. It is note- 
worthy that the Draw-A-Man has produced no 
acceptable schemes for personality interpreta- 
tion. However, as pointed out in the discussion of 
the Draw-A-Man. the most promising results in 
personality, as well as in cognitive assessment, 
have been those employing detailed, objective 
techniques of scoring, such as the point scales. 

The selection of five cards of the TAT for 
the Survey undoubtedly reflects (1) the appraisal 
of existing personality tests mentioned above, 
combined with (2) the recognition of apparent 
widespread acceptance of the TAT as a pro- 
tective technique and (5) the belief that an 
appropriate method of objective scoring of re- 
sponses to them can be developed for the specific 
use of the Survey as well as for later more 
general use by professional workers. The basis 
for this appraisal cannot be documented here, 
although the writer is prepared to defend it. 
Reference to the forthcoming Sixth Mental Meas- 
urements Yearbook (O. Euros, ed.. Mew Bruns- 
wick, N.J. , The Gryphon Press) might be suffi- 
cient for this purpose. The evidence for the 
recognition of acceptance of the TAT is discussed 
below, together with an evaluation of the prospects 
for successful development of an objective scoring 
procedure. 

REVIEW OF THE LITERATURE 
ON THE TAT 

The present review includes abstracts of pub- 
lished research articles, theses, and critical 
reviews of the TAT literature, as well as 5 general 
references on the thematic apperception method. 
These constitute only a small portion of the ex- 
tensive psychological, anthropological, and socio- 
logical research on the TAT and its variants which 
have appeared in undiminished quantity over the 
years (e.g., Thompson's Negro edition of the TAT, 
Symonds' Picture Story Test, Bellak's Children's 
Apperception Test (CAT), Van Lennep's Four 
Picture Test, Phillipson's Object Relations Tech- 
nique, and numerous other techniques which can 
be traced to the Murray version). Both the TAT 
procedure and the Murray "need-press" concepts 
have been used extensively in personality studies 



and studies of motivation. The items selected for 
inclusion in this report were judged relevant If 
they (1) used a measurement approach, (2) were 
validation or normative studies, (3) had an appli- 
cable sample in terms of age, or (4) used an 
adequate scoring procedure. 

Overview 

Treatment of the TAT by different writers 
ranges from uncritical acceptance on t):'i basis 
of a priori assumptions, illusti-ated by Henry (749) 
and Piotrowski (702), through qualified acceptance 
with a "soft" attitude toward the contradictory 
evidence, as demonstrated by Mayman (701) and 
Lindzey (703), to objective evaluation, illustrated 
by Eron (706), Windle (704), 'and others. Windle's 
comment, that there is little agreement among 
results reported by different investigators, seems 
to describe accurately this field of research. One 
area in which some agreement may be found, 
however, is that of cognitive evaluation (714 and 
737-739); this is highly reminiscent of the Draw- 
A-Man. 

The TAT literature abounds in elaborate but 
largely untested (critically, that is) scoring 
systems. Most of these are too extensive for brief 
summarizadon and go beyond the purposes of this 
report. However, they have been reviewed in 
anticipation of a further empirical study of the 
Survey's Thematic Apperception Test data, and 
references to 21 additional selected reports are 
included in the bibliography of section lY. 

Most of these, as well as a number of other 
suggested analytic methods of scoring the TAT, 
are well summarized in a 1951 publication by 
Edwin S. Shneidman, Walther Joel, and Kenneth B. 
Little (800). Although the modes of analysis vary 
in detail and in terminology, the typical one in- 
volves interpretation and frequency counting or 
evaluation on a rating scale of all or part of the 
following types of information, usually across all 
of the stories obtained for a selection of cards. 
(The full series of cards is often abridged because 
of practical time limitations, as it is in the 
Survey.) 

Formal (structural) aspects of the stories 

Compliance with instructions (including card 
rejection) 
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Consistency of stories 

Length of stories; vocabulary level 

Grammatical forms (nouns, pronouns, verbs, 
incomplete sentences) 

Number and type of situations described 

Number and rype of characters inclfided 

Outcome of stories 

Le;vel of response (from description to im- 
aginative interpretation) 

Interpretive categories 

Feelings, mr-ods, worries, emotional tone 

Needs expressed (or implied) 

Conflict areas 

Presses — physical, emotional, mental, eco- 
nomic, social, religious 

Characters— strivings, attitudes, obstacles, 
barriers, traits, and roles of hero, major 
characters, and minor characters 

Outcomes reflecting success, failure 

Thematic content — family dynamics, inner 
adjustment, sexual adjustment, interpersonal 
relations, aggression (physical, nonphysical) 

Developmental level in Freudian (psycho- 
sexual) context 

Defense mechanisms utilized 

Manner in which environment is assimilated 

The number of variables enumerated under 
these categories is extensive (Murray's need- 
press system alone exceeds 83). and in most 
cases the variables require detailed, careful 
definition and intensive training of scorers. High 
reliabilities have often been achieved among 
scorers within a particular laboratory for a given 
period of tenure of the staff members involved, 
but these have not generally been maintained with 
staff changes or when systems have been tried 
out at other institutions. Often, definitions change 
over time as new generations of protocols appear, 
requiring decisions in relation to categories 
developed on the basis of earlier samples. 



In spite of the logical (from some theoretical 
positions) appeal of these analytic approaches, 
they do not fit the requirements of psychometric 
procedures. Such analytic approaches satisfy the 
needs of various clinicians cr investigators in 
their individual practices r Ixesearches, but for 
survey purposes they are useful primarily because 
they suggest areas which may be suitable for 
objective study. With the e:!:ception of seme formal 
characteristics (such as length of story and other 
items that can be counted f'^^riy accurately) which 
have been related to developmental rather than 
personality-adjustment concepts, there is so little 
agreement in the literature on most scoring cate- 
gories that an investigator seeking to develop an 
objective scoring procedure might as well start 
from '^scratch,'' 

Research Demanstrating 
Developmental Factors 

Edelstein (737) completed an interesting pilot 
study demonstrating a system for scoring TAT 
stories. From her system a total age-adjusted 
score, correlating well with Stanford- Binet IQ's, 
could be derived. She used the following six 
scoring categories—number of words, qualifier/ 
word ratio, number of conditions, number of 
responses, number of situations involved, and 
number of characters. Her sample included only 
15 boys and 13 girls (ages 9-5 to 12-5), but from 
a metliodological viewpoint her study is promising. 

In a conceptually related study, Armstrong 
(714) administered the CAT (cards 1, 2, 4, 8, and 
10) to a sample of 60 children in grades 1 to 3 in 
the University of Minnesota elementary school. 
The findings of her study relevant to the present 
review are as follows: (1) length of story in- 
creases with grade, (2) girls' protocols are 
longer than those of boys, (3) the use of first 
person pronouns shows a slight but consistent 
decline with grade progression, (4) girls tend 
to make more subjective and personalized state- 
ments than boys, and (5) girls have a consistently 
longer reaction time than boys. 

Slack (761) gave the TAT to 15 exogenous 
feebleminded boys and 12 endogenous ones at the 
Vineland Training School. He correlated a score 
reflecting the raif fiber uf cauaally and purpose^ 
fully connected 5fa^e?we«/s with the Stanford- Binet 
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and with Thurstone's test of Primary Mental 
Abilities (PMA). With chronological age held 
constant, causally or purposefully connected 
statements correlated with other variables as 
follows: S-B MA, 0.58; PMA MA. 0.70; PMA 
Verbal MA, 0.51; PMA Motor MA, 0.72. Length of 
stories (number of words) correlated as follows 
with the same variables (CA held constant): S-B 
MA, 0.31 (ns); PMA MA, 0.34 (ns); PMA Verbal 
MA, 0.53; PMA Motor MA, 0.48. The age-cor- 
rected correlation of number of purposeful re- 
lations with tiie PMA Verbal MA was 0.90, and the 
correlation of number of causal relations with 
the same measures was 0.42. Slack also reported 
a significant difference between the endogenous 
and exogenous groups on length of stories. 

These studies lend some limited support to 
the possibility of developing an objective scoring 
system based on developmental criteria for the 
five TAT pictures used in the Survey. 

Other Relevant Research 

The following studies were selected for cita- 
tion on the basis of their relevance to the Survey 
problems. Lesser (720) demonstrated how a 
Guttman-type scale could be developed for 
measurement of aggressive fantasy. Bijou and 
Kenny (732) and Murstein (734) investigated 
ambiguity values of TAT cards. The former found 
the following ambiguity ranks (out of 21) for the 
four picture cards used in the Survey (card 16, 
blank, was not rated): 

Card number Rank 

1 2 

2 3 

5 17 

8BM - 11 

The latter reported that cards with medium 
ambiguity (8BM) were most "productive'* of the- 
matic content among college stuueiits. 

Milam (735) demonstrated the eensUMty of 
TAT responses to examiner influence. Apparently, 
the attitudes and behavior of the examiner, as 
perceived by the subject, account for variarnce in 



the TAT responses. This is true of all psycho- 
logical tests. It is not possible Co say whether 
this is a greater problem on che TAT than on the 
Wise, for example, but it must be kept in mind 
as a significant source of uncontrolled variation. 

Gurevitz and Klapper (763) found that schizo- 
phrenic children characteristically respond to 
CAT cards with bizarre outcomes, evaluation of 
stimuli, use of titles, hostility, and verbosity. 
Holden (766) compared a small sample of cerebral 
palsied children with normal controls. His results 
clearly suggest that cerebral palsied respondents 
tend to describe the cards, while normal controls 
give more thematic content. The average number 
of descriptions (out of 10 cardts) was 6.U for the 
palsied children and 2.8 for the controls. Leitch 
and Schafer (770) reported a number of response 
criteria identifying psychotic responses. 

From the standpoint of further research on 
the development of a scoring procedure for the 
TAT, the following list of specific items has been 
recorded and evaluated in one or more of the 
studies- reviewed (reference numbers shown in 
parentheses). In most cases the results were not 
included in the main discussion either because of 
sample limitations, subjective methods of scoring, 
inconclusiveness of results, or unrelatedness to 
the present problem. Many of them, however, do 
appear definable and worthy of further study. 

Frequency and duration 

RT latency (705 and 747) 

Total reaction time (705 and 747) 

Number of words (707, 714, 737, 741, 746, 
747, and 764) 

Number of adjectives (737) 

Number of adverbs (737) 

Number of nouns (714) 

Number of pronouns (714) 

Number of verbs (714) 

Number of questions (705) 

Number of ego words (714) 

Number of situations (737) 

Number of characters (707 and 737) 
Male, female 
Nature of action 

Crying (718) 

Dancing (737) 

Disaster (713) 

Drunkenness pV) 
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Escape solutions (705 and 718) 

Fear of punishment (742) 

Fighting (720) 

Hardship (713) 

Illness (713) 

Loss of ability, skill, 

money (737) 
Suicide (705) 
Frightening (737) 
Killing (720) 
Ridiculing (720) 
Making fun of (737) 
Punishment (705 and 743) 
Stealing (737) 
Receiving aid (705) 
Giving aid (705) 
Teaching (737) 
Laughing (737) 
Singing (737) 

Book or movie cited as source (705) 
Criticism of picture (705) 
Liked, disliked (705) 
Title (763) 

Number of themes (707. 712, and 764) 

Card description 

Parts referred to (705) 

Number of rare picture details (705) 

Compliance with in£in-uctions (705, 707, 
and 721) 

Examiner included in story (770) 
Response 

Bizarre (705 and 763) 

Queer (770) 

Contradictory (770) 

Incoherent (705 and 770) 

Transcendental (707 and 714) 
Number of references 

Future events (705 and 721) 

Past events (705 and 721) 

Present events (705 and 721) 
Level (712, 721, 755, 766. and 776) 

Enumerative 

Descriptive 

Interpretive 
Language 

Neologisms (770) 

Stereotyped (705) 

Vocabulary level (705) 

Unusual wording (770) 

Fluency (705) 
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Repetitions (770) 

Foreign expressions 
Relative age of characters (705) 

Older 

Peer 

Younger 
Sex role identification (705) 

Own 

Opposite 
Ambiguous 
Tone of story (712) 
Emotional 
Submission to fate 
Rebellion 
Fear 
Worry 

Lack of affect 
Aspiration 
Shift of tone 
Theme of story 
Unrelated (770) 
Curiosity (738) 
Scorning (720) 
Social approval (713) 

Positive 

Negative 

Evasive 
Stressful (725) 

Ordinary family activity (712) 

Mental inadequacy (713) 

Motivational inadequacy (713) 

Physical inadequacy (713) 
Perceptual distortions (705, 712, and 770) 
Neatness or orderliness of story (705) 
Overspecific statements (770) 
Overgeneralizations (770) 
Autistic logic (770) 
Feelings 

Anger toward parent(s) (743) 

Aesthetic (705) 

Ambivalent (705) 

Benign (705) 

Conflict (705) 

Empathy (723) 

Frustration (705 and 713) 

Guilt (705 and 713) 

Happiness (747) 

Hate (720) 

Independence (713) 

Inferioi ity (705) 



Paranoid (705) 

Parental anger to child (743) 

Pleasant (705) 

Pleasure (713) 

Sadistic (705) 

Security (713) 

Number of causal relations (761) 
Number of purposeful relations (761) 
Outcomes (713, 763, 772, and 775) 
Failure 

Success 

Aggressive (772) 
Clarity of statement (705) 
Bizarre (763) 
Self-reference (705) 

Number of personalized statements (705 and 
714) 

Degree of response certainty (705) 
Level of interpretation (Eron, 712) 

Symbolic 

Abstract 

Descriptive 

Unreal 

Fairy tale 

Central character not in picture 

Autobiographical 

Continuations 

Alternate themes 

Comments 

Denial of theme 

Rejection 

Peculiar 

Confused 

Includes examiner in story 

No connection between story and picture 

Humorous 

PROSPECTS FOR DEVELOPING 
AN OBJECTIVE SCORING KEY 
FOR THE SURVEY'S TAT 

Although the TAT literature is scientifically 
"sloppy" in comparison with the material reviewed 
in relation to the WISC andtheOraw-A-ManTest, 
the following assumptions seemed warranted: (1) 
a substantial njmber of items (botli formal-struc- 
tural and thematic -interpretive) can be reliably 
defined and accurately scored, (2) discriminating 



developmental criteria can be devised, and (3) 
an objectively defined scoring system can be 
developed which will contribute useful information 
regarding development between ages 6 and 12 
years. 

It seems unlikely, in light of the literature 
reviewed, that scoring scales can be constructed 
which will measure factors such as motivation, 
affective states, and personality traits. However, 
this is not serious since there is no indication that 
these factors have any developmental impli- 
cations. 

The anticipated developmental scales would 
greatly enrich the imormation obtained in the 
Survey by possibly providing developmental norms 
with regard to behavioral aspects not encompassed 
by the other tests, such as verbal expression, 
thematic content of imagination in standard test 
situations, associations to standard stimuli, role 
concepts and attitudes in relation to self, peers of 
same and opposite sex, parental and adult figures, 
and common cultural values. 

While the picture samples are limited, they 
appear to be well chosen for the purpose. Card 1 
has a boy as the central figure; card 2, a girl; 
card 5, an adult-parental (mother) figure; and 
card 8BM, a possible stressful situation— involv- 
ing a father figure— within the experience back- 
ground of most school-age children. Card 16, the 
blank card, is completely unstructured. As a set 
of cards having nearly universal applicability in 
a United States national sample, the selection 
appears excellent. 

One of the advantages that an investigator 
working on this problem would have over most of 
those who have published reports in this area is 
the large sample obtained under standardized 
survey conditions. With adequate funds to work 
with a fairly large sample of perhaps 1,000 or 
more cases, a good test of these conclusions 
could be made. Of course, there is no guarantee 
that the results will be entirely satisfactory, 
although the prognosis appears good. 

However, the Survey is committed to doing 
something with these data, and no suitable scoring 
procedure is presently available. In the writer's 
judgment, the options available were nearly all 
unsatisfactory, and the one taken may prove to be 
a wise decision. 
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V. TOTAL PSYCHOLOGICAL TEST BATTERY 



The foregoing reviews of the several com- 
ponents of the Survey's psychological test battery 
have discussed the strengths and weaknesses of 
each test and the problems involved in estimating 
poj>ulation parameters on a national scale from 
the sample data. In each case a number of specific 
problems were raised, and suggestions for treat- 
ment of data or for further research have been 
made in the respective sections of the report. 
However, the most important common problem 
derives from the examination of the standardi- 
zation basis of these tests. The norms for the 
Wise are unquestionably the most satisfactory, 
with the Draw-A-Man being second; the adequacy 
of the Wide Range Achievement Test norms has 
been questioned (see section 11). Finally, new 
norms, related to the scoring system to be 
developed for the TAT, are yet to be constructed. 

In order to achieve the soundest possible 
basis for population estimates witli this battery, 
it is recommended that new national norms, based 
on the total Survey sample, be developed for all 
of the tests before a^iv *inal population estimates 
are published. While sc^'c preliminary estimates 
may be warranted, using norms provided by the 
test publishers, the discussions in the individual 
sections of the report point up the necessity of 
the recommended restandardization. 

In the event that this work cannot be fully 
supported, the order of priority indicated by the 
review would place the reanaly£?is of the WRAT 
first, the Draw-A-Man Test second, andtheWISC 
third. It is assumed that this must be done for the 
TAT when a new scoring procedure is completed 
and adopted. 

The issues in relation to the WRAT are as 
follows: (1) No adequate sampling plan was fol- 
lowed in standardizing the 1963 revision, and, in 
fact, the bias of the sample is clearly mentioned 
in the manual. (2) The test scores used to compile 
the sample by levels are not equivalent; therefore, 
only limited confidence can be placed in the re- 
sulting norm levels, even though substantial 
correlation of the WRAT scales with concurrent 
criteria appears likely. 

In the case of the Draw-A-Man Test, it is 
recognized that (1) the Goodenough norms are 
outmoded, and that (2) the use of the Harris 



norms (which is recommended) without analysis 
of the raw score distributions on the national 
sample might lead to some errors. The adminis- 
tration of the Draw-A-Man Test in the Survey 
was different from that recommended by Harris, 
and it would be prudent to proceed empirically 
rather than to assume that the Survey drawings 
are equivalent. In addition, Harris' own norms do 
not reflect as good a national sample as even the 
Wise, for which further standardization is un- 
questionably justified. 

One of the major problems with the WlSC 
subtests is that of examining further the optional 
basis for estimating Full Scale IQ*s from the 
Vocabulary and Block Design scores. Even if 
restandardization should reveal no need for re- 
scaling the subtest items, the adoption of published 
conversion tables or direct proration is con- 
sidered ur.justified without further research. This 
is discussed ^^n more detail in section I. 

The information expected from the test 
battery may be summarized as follows: 

1. Wise Vocabulary-^score. This test indi- 
vidually provides a good estimate of "g, ' 
the common "general intelligence" factor 
in the WISC, and may be accepted as a 
good measure of the verbal component 
of the general measure of intelligence. 

2. WISC Block Design-^score. This test is 
also well saturated in "g" and second only 
to Vocabulary in reliability. It should be 
accepted as a strong nonverbal intelli- 
gence test and as an estimate of the non- 
verbal component of the full test. 

3. Draw- A- Man Test — Goodenough -Harris 
standard score. The Goodenough- Harris 
standard score (preferably restandard- 
ized on the total Survey sample) can be 
interpreted as a deviation IQ, in a manner 
comparable to the WISC IQ's. This score 
is a reliable and reasonably valid non- 
language measure of mental maa:rity. 

4. WRAT Oral Reading— grade equivalent 

(Rq). 

5. WRAT Oral /2ea^i>j^— standard score 

(Rss). 

6. WRAT i4n7^wehc— grade equivalent (Aq). 

7. WRAT Aritnmencslandard score (Ass). 
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Both the grade equivalents and the stand- 
ard scores will be useful 'or the WRAT 
Reading and Arithmetic subtests (partic- 
ularly if they are restandardized on the 
total Survey sample). The grade equiva- 
lents will permit assessment of school 
retardation, while the standard scores, 
which have the same characteristics as 
deviation IQ*s, will be more appropriate 
in pattern analytic combination with the 
Wise and Draw-A-Man scores. 
8. TAT — developmental score(s). Thic may 
actually be a series of scores. It is entered 
"symbolically" at this time. 

It is possible to think of these data as pro- 
viding individual profiles or patterns which sup- 
plement information represented by the individual 
scores. For example, some children may rank 
high or low on all scalee, indicating general ex- 
cellence or retardation in comparison with the 
general population. There may also be discrimi- 
nate test patterns associated with such special 
conditions as reading disability, mental defi- 
ciency, scholastic retardation, verbal impair- 
ment due to physical or social reasons, behavior 
disorders, and cultural deprivation. If such pat- 
terns exist, it should be possible to identify them 
by a standard research design based on discrim- 
ination of exp>erimentally formed criterion groups. 
A hierarchical grouping analysis of score profiles , 
seeking to identify characteristic profiles of 
groups, would be an alternative approach. 

In this procedure, identification of criterion 
characteristics of the groups would follow rather 
than precede the main analysis. In either case, 
criterion data would be obtained from record 



sources v ithin the Health Examination Survey. 
In this typ.-* of analysis it might also be profitable 
to explore patterns based on scores representing 
discrete residuals, with common variance par- 
tialled out and represented by an additional 
variable. 

Computer programs for these types of analy- 
sis are available, and such studies could be con- 
ducted economically on subsamples of the Survey 
sample. 

The inclusion of these psychological tests in 
the National Health Survey was a very important 
step which has tremendous practical value to the 
health, education, and welfare fields and which 
also has immense scientific value in the life 
sciences concerned with child development. De- 
spite the technical criticisms, which are in- 
evitable in a problem of the magnitude of this 
national survey, the tests have been judged to I>c 
either a good choice or at least an eminently 
reasonable compromise with reality within the 
constraints of the Survey. 

The research recommended should be looked 
on as an unprecedented opportu:iity to contribute 
toward adequate mental measurement of children. 
It is important for those working in this Survey 
to bear in m^nd that this is the first general sur- 
vey of psycnological functions of children ever 
conducted on a sophisticated national sample. 
The standardization programs for the tests re- 
viewed—and for others referred to— fail to quali^y 
for this distinction. National psychological sur- 
veys of adults have bfjen made in both World Wars, 
and recently a national survey of adolescents was 
conducted by Project TALENT. However, Cycle II 
is, to the writer's knowledge, the first one of its 
kind in the age range of 6 to 12 years. 



VI. CROSS-DISCIPLINARY ANALYSES 



The complete dataof Cycle II may be regarded 
as composing a matrix of several thousand vari- 
ables (specific measures or components of meas- 
urement procedures) over a sample of neaily 
8,000 children, hi the processes of data reduction 
and analysis, many of these variables will remain 
in the matrix withou. further manipulation (e.g.. 
height, weight, body temperature, family income 



level, twin status, number of siblings, and ages 
of parents). Some will require prescheduled 
analysis and computation of indexes according to 
established procedures in the respective fields 
(e.g.. visual acuity, exercise tolerance, and 
electrocardiogram), while others will require 
extensive processing on the basis of empirically 
constructed or revised scoring keys and norms. 
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as in the case of the psychological tests dis- 
cussed in this review. 

Upon completion of segmental analysis of each 
testing and examining procedure and reduction of 
all data to indexes and primary variables, it would 
be desirable to consider multivariate analysis of 
the resulting matrix. This type of approach will 
undoubtedly reveal many significant interrelation- 
ships not previously investigated because of lack 
of appropriate data. It is premature to consider 
it iiow, however, before the reduced daia schedule 
is more definitely known. 

The primary purpose of the present dis- 
cussion is to explore possible linkages between 
the psychological tests in the Survey battery and 
other variables. This, too. is a formidable task, 
but some important areas of investigation are 
opened up by this Survey, and these opportunities 
for significant research deserve special mention. 

DATA AVAILABLE 

From various sources within the Survey, data 
on items such as the following, ^^^hich have im- 
portant behavioral implications, will be available: 

Parents — age, nativity, education, income level, 
language spx)ken, psychiatric history, marital 
status, handedness, and use of medical care. 
(The distri.butions of these variables are of 
interest. In addition, an SES index of socio- 
economic level can be derived.) 

Siblings — number, twins, ages, education, marital 
status, work status. (From these data an 
additional variable, birth ordinal position, 
can be derived.) 

Family— size, living status, ethnic classification, 
race, SES. 

Child — school information: grade placement; 
progress rate; absences; characterization as 
requiring special provisioii for hard of hear- 
ing, visually handicapped, speech therapy, 
orthopedically handicapped, gifted, slow 
learning, mentally retarded, emotionally dis- 
turbed; description in relation to adjustment, 
attention, interpersonal relations, discipline, 
popularity, intellectual ability, academic per- 
formance. (These data are wortby of some 
detailed analysis in order to formulate ex- 
ternal rating criteria for independent test 



validation and to derive further indexes, such 
as peer rejection (based on interpersonal 
relations and popularity) , general adjustment, 
and general adequacy (based on a frequency 
count of negative citation). 

ChWd^ medical history: prenatal and birth cir- 
cumstances, food habits, enuresis, thumb- 
sucking, age of walking, talking, early 
learning rate, attendance at kindergarten, 
experience of unconsciousness, bad burns 
(with resulting scars), serious illness, weak- 
ness, nightmares, sleeping arrangements, 
age at pui>erty (giris). (Frequency distribu- 
tions of these items, particularly of food 
habits, which would also provide a basis for 
judging food idiosyncracies, and sleeping 
arrangements, which should correlate with 
SES but may also rciate to other variables, 
should be of great interest. Correlations of 
many of these items with other data may be 
extremely important, as, i-yx example, the 
investigation of sequelae of ^ i rly uncon- 
sciousness and the development of a growth 
retardation classification, a disturbance in- 
dex, and a **weakness** index,) 

Child — sensory and motor indexes: visus' acuity, 
color vision, hearing indexes, ha^: :*dness, 
grip strength, vital capacity, exerch toler- 
ance. 

ChWd-'-body measurements: height, weight, an- 
thropometry, X*ray, dentition. 

Child — psychophysiological indexes: blood pres- 
sure, temperature, electrocardiogram, pho- 
nocardiogram. 

Child — medical findings: health status, pathology. 

Cnild-^ psychological tests: IQ estimates; verbal 
ability level; performance ability level; 
reading, arithmetic, maturity level; adjust- 
ment index. 

ANALYSES INDICATED 

The organization and ordering of the lines of 
analysis suggested in this section are tentative 
and are not intended to suggest priorities. In 
most cases, further study of the literature in the 
particular areas and consultation with qualified 
professional persons would be appropriate before 
committing time and funds to particular studies. 
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Nevertheless, the richness of this "data bank" Ir 
recognized as a source of new scientific knowledge, 
and it is hoped that it can be adequately exploited. 

Growth Indexes 

It is expected that mean growth Indexes for 
boys and girls will be computed for as many 
functions as possible over the six age periods. 
Analysis of relations among growth trends— 
separately for boys and for girls — and of growth 
rate patterns would be of direct interest and 
would also permit comparison of pattern indexes 
with psychological test scores. Sex differences in 
growth patterns and relations of sex-related 
patterns to test scores are also of great interest. 

Other Factors Related to Test Scores 

Discriminant pattern analyses might be un- 
dertaken systematically in a multivariate design 
to investigate parental, sibling (including birth 
order and twin resemblance for the twin sample), 
family, school, medical, sensory and motor, 
anthropometric, psychophysiological, and medical 
correlates of psychological test scores. While 
this recommendation may appear forbidding in 
magnituoe, the multivariate approach is actually 
more efficient and economical in total perspective 
than piecemeal analyses. Among the studies im- 



plied in this broad prescription are the following 
types of investigations: 

1. Reading disability. Effects of visual and 
auditory impairment; handedness; SES; 
growth trends; developmental history; 
early, recent, and continuing emotional 
disturbance; illness; birth order, etc. 

2. Mental retardation. Every item in the 
above enumeration is potentially related 
to mental retardation. 

3. School retardation. Same as above. 

4. Analyses oj discrepancies between actual 
and predicted status in relation to con- 
comitant or associated factors. These 
data offer an excellent opportunity to look 
for significant variance associated with 
overachievement and underachievement in 
school grade placement, reading achieve- 
ment (WRAT and school report), scho- 
lastic achievement (school report, WRAT 
Arithmetic), and peer relations (deviation 
from central tendency). 

While more detailed and specific investi- 
gations could be enumerated, it is more con- 
structive to emphasize the advisability of using 
the multivariate approach, since computer equip- 
ment and programs are available for such analyses 
and since results of greater value can be obtained 
at a far lower unit cost. 



Acknowledgnrtents 

The literature review and preparation of ab- 
stracts was under the immediate direction of 
Samuel H. Cox, Research Associate at the Institute 
of Behavioral Research. Principal persons assist- 
ing Mr. Cox were Robert M. Marx, John McCrady, 
Henry Orloff, and Max S« Taggart II. 

The project also was greatly expedited through the 
efforts of Miss JohnoweenGill, Reference Librar- 
ian, Texas Christian University. 

Without the loyal and competent help of these 
individuals this report could not have been com- 
pleted in only 3 months. 



ERIC 



GLOSSARY OF ABBREVIATIONS 



BD: 

CA: 

CAT: 

CMAS: 

CRT: 

CTMM: 

E-G-Y: 

FRPV: 

FS: 

g- 

HES: 

IQ: 

M: 

MA: 

N. 

ns: 

PPVT: 
PS: 
R: 
r: 

RT: 

SAT: 

S-B: 

SES: 

SRA: 

SRA-PMA: 

SS: 

TAT: 

Voc: 

VS: 

WAIS: 

WISC: 

WRAT: 
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Block Design subtest of the Wecbsler Intelligence Scale for Children 

Chronological age 

Children's Apperception Test 

Children's Manifest Anxiety Scale 

California Reading Test 

Chicago Tests of Primary Mental Abilities 

Kent E-G-Y Test (Scale D, Kent Seiries of Emergency Scales) 

FulURange Picture Vocabulary Test (by Aminons) 

Full Scale (or Full Score) of the Wechsler ImelUgence Scales 

General, or "global," intelligence factor 

Health Examination Survey 

Intelligence quotient 

Mean 

Mental age 

Number 

Not significant 

Peabody Picture Vocabulary Test 

Performance Scale (or Performance Score) of the Wechsler Intelligence tests 

Range 

Correlation 

Response time 

Stanford Achievement Test 

Stanford-Binet Intelligence Scale 

Socioeconomic status 

Science Research Associates, Inc. 

SRA Pi'-imary Mental Abilities 

Stan'ferd score 

Thematic Apperception Test 

Vocabulary subtest of the Wechsler Intelligence Scales 

Verbal Scale (or Verbal Score) of the Wechsler Intelligence tests 

Wechsler Adult Intelligence Scale 

Wechsler Intelligence Scale lor Children 

Wide Range Achievement Test 
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